Tutorial: Learn SQL basics #
Welcome to the Tutorial: Learn SQL basics tutorial, where you explore useful ways to craft
your SQL statements using the Sample dataset. The SQL
statements start with SELECT
expressions that return every row, and increase
in complexity with the addition of clauses and operations like WHERE
, GROUP
BY
, and ORDER BY
.
Requirements #
To complete the tutorial using Starburst Galaxy, you need to create a free trial or activate your user account.
Once you login to Starburst Galaxy, the sample
cluster and sample
catalog
are pre-configured and ready for querying.
- If you do not see the
sample
catalog automatically available, create a Sample dataset catalog with the namesample
. - If you do not see the
sample
cluster automatically available, create a cluster namedsample
and add thesample
catalog.
Navigate the query editor #
In the query editor, navigate to the Cluster explorer.
- Select the
sample
cluster dropdown in the left hand navigation to view the catalogs. -
Select the
sample
catalog dropdown.
Select the cluster sample
and the catalog sample
in the top right corner in
order to run the queries without specifying the catalog and the schema before
each table.
The namespace for a table is typically specified as
<catalog_name>.<schema_name>.<table_name>
. For the pre-defined sample dataset,
this configuration is as follows:
sample.demo.<table_name>
If you choose to name your catalog differently, either adjust the queries accordingly or select the appropriate catalog in the top right corner.
If you choose to name your cluster differently, select the appropriate cluster in the top right corner.
SELECT statement #
The SELECT
clause specifies the output of the query. There are multiple
Select expressions that can be
utilized depending on the query requirements.
Use the SELECT *
expression and query the astronauts
table to view the
data available for analysis.
SELECT
*
FROM
sample.demo.astronauts;
The astronauts table displays all the astronauts that are associated with a mission, which creates duplicate entries of the same astronaut because each row represents not only the astronaut but each mission the astronaut has completed.
Use the SELECT *
expression and query the missions
table to view the data
available for analysis.
SELECT
*
FROM
sample.demo.missions;
Instead of using the SELECT *
expression to return all the columns in the
table, only return columns that are beneficial for your analysis of the
astronauts
table.
SELECT
name,
nationality,
mission_title,
mission_number,
hours_mission
FROM
sample.demo.astronauts;
Instead of using the SELECT *
expression to return all the columns in the
table, only return columns that are beneficial for your analysis of the
missions
table.
SELECT
company_name,
status_rocket,
cost,
status_mission
FROM
sample.demo.missions;
WHERE clause #
The WHERE
clause is used to specify a condition and only returns records that
satisfy the conditional criteria.
There are many astronauts from the U.S. and U.S.S.R, so use the WHERE
clause
to alter the query and only view the astronauts of other countries.
SELECT
name,
nationality,
mission_title,
mission_number,
hours_mission
FROM
sample.demo.astronauts
WHERE
nationality NOT LIKE 'U.S.%';
Use the WHERE
clause to query the table and only return missions that are
classified as a success.
SELECT
company_name,
status_rocket,
cost,
status_mission
FROM
sample.demo.missions
WHERE
status_mission = 'Success';
ORDER BY clause #
The ORDER BY
clause is used to sort a result set by one or more output
expressions.
Add a sort to the astronauts query to view the results in order of nationality and name.
SELECT
name,
nationality,
mission_title,
mission_number,
hours_mission
FROM
sample.demo.astronauts
WHERE
nationality NOT LIKE 'U.S.%'
ORDER BY
nationality,
name;
A gentle reminder that the astronauts table includes duplicate entries for selected astronauts; however, these astronauts have a unique identifier for each completed mission.
Aggregate functions #
Aggregate functions operate on a set of values to compute a single result.
Use one query to find three different aggregate values. Discover the amount of trips, the longest mission in hours, and the smallest mission in hours from select countries.
SELECT
count() as trips,
max(hours_mission) as longest_mission,
min(hours_mission) as shortest_mission
FROM
sample.demo.astronauts
WHERE
nationality NOT LIKE 'U.S.%';
Instead of evaluating all the data together, increase the granularity level and evaluate the aggregates specifically for each country.
SELECT
nationality,
count() AS number_trips,
max(hours_mission) AS longest_time,
min(hours_mission) AS shortest_time
FROM
sample.demo.astronauts
WHERE
nationality NOT LIKE 'U.S.%'
GROUP BY
nationality;
Add an ORDER BY
clause and sort the results by most trips per country. In the
case of a tie, sort additionally by longest mission.
SELECT
nationality,
count() AS number_trips,
max(hours_mission) AS longest_time,
min(hours_mission) AS shortest_time
FROM
sample.demo.astronauts
WHERE
nationality NOT LIKE 'U.S.%'
GROUP BY
nationality
ORDER BY
number_trips DESC,
longest_time DESC;
Revisit the previous missions
query and add an Aggregate
function to determine which company
spent the most money on successful missions. Order this query by the total
highest cost.
SELECT
company_name,
sum(cost),
status_rocket
FROM
sample.demo.missions
WHERE
status_mission = 'Success'
GROUP BY
company_name,
status_rocket
ORDER BY
sum(cost) DESC;
Joining tables #
Joins allow you to combine data from multiple tables. A cross join returns all combinations of two relations.
Observing each table separately, it seems like an ambiguous task to join the two
together. However, there is a cross-join operation between the missions detail
and the astronauts mission_title
which counts the number of trips taken by
each company in each country. Let’s construct this cross-join.
Observe example detail information from the missions table. The first part of the information is the space shuttle, the second part is the mission.
SELECT
detail
FROM
sample.demo.missions
WHERE
detail LIKE('%Shenzhou 6%')
OR detail LIKE('%STS-86%');
By matching the second half of the missions detail
information with the
wildcard mission_title
information from the astronauts table, you can
determine your cross-join. To create the wildcard comparison, use the format
function.
SELECT
format('%%%s%%', mission_title)
FROM
sample.demo.astronauts;
Each mission_title
is returned enclosed in leading and trailing wildcard
characters. Each set of two wildcards '%%'
return one wildcard '%'
,
and the '%s'
is considered a placeholder. Hence, the result is
%mission_title%
.
Now, run the cross-join to evaluate the trip number by company and country.
SELECT
m.company_name,
a.nationality,
count() as number_trips
FROM
sample.demo.astronauts a
JOIN sample.demo.missions m ON m.detail LIKE format('%%%s%%', a.mission_title)
GROUP BY
a.nationality,
m.company_name
ORDER BY
m.company_name,
a.nationality;
Next steps #
Check out our other tutorials, or dive right into the SQL documentation and experiment with your own data.
Is the information on this page helpful?
Yes
No
Is the information on this page helpful?
Yes
No