Tutorial: Learn SQL basics #

Welcome to the Tutorial: Learn SQL basics tutorial, where you explore useful ways to craft your SQL statements using the Sample dataset. The SQL statements start with SELECT expressions that return every row, and increase in complexity with the addition of clauses and operations like WHERE, GROUP BY, and ORDER BY.

Requirements #

To complete the tutorial using Starburst Galaxy, you need to create a free trial or activate your user account.

Once you login to Starburst Galaxy, the sample cluster and sample catalog are pre-configured and ready for querying.

  • If you do not see the sample catalog automatically available, create a Sample dataset catalog with the name sample.
  • If you do not see the sample cluster automatically available, create a cluster named sample and add the sample catalog.

In the query editor, navigate to the Cluster explorer.

  1. Select the sample cluster dropdown in the left hand navigation to view the catalogs.
  2. Select the sample catalog dropdown.

      Query editor sample cluster explorer

Select the cluster sample and the catalog sample in the top right corner in order to run the queries without specifying the catalog and the schema before each table.

  Query editor use sample catalog button

The namespace for a table is typically specified as <catalog_name>.<schema_name>.<table_name>. For the pre-defined sample dataset, this configuration is as follows:

sample.demo.<table_name>

If you choose to name your catalog differently, either adjust the queries accordingly or select the appropriate catalog in the top right corner.

If you choose to name your cluster differently, select the appropriate cluster in the top right corner.

SELECT statement #

The SELECT clause specifies the output of the query. There are multiple Select expressions that can be utilized depending on the query requirements.

Use the SELECT * expression and query the astronauts table to view the data available for analysis.

SELECT
  *
FROM
  sample.demo.astronauts;

The astronauts table displays all the astronauts that are associated with a mission, which creates duplicate entries of the same astronaut because each row represents not only the astronaut but each mission the astronaut has completed.

Use the SELECT * expression and query the missions table to view the data available for analysis.

SELECT
  *
FROM
  sample.demo.missions;

Instead of using the SELECT * expression to return all the columns in the table, only return columns that are beneficial for your analysis of the astronauts table.

SELECT
  name,
  nationality,
  mission_title,
  mission_number,
  hours_mission
FROM
  sample.demo.astronauts;

Instead of using the SELECT * expression to return all the columns in the table, only return columns that are beneficial for your analysis of the missions table.

SELECT
  company_name,
  status_rocket,
  cost,
  status_mission
FROM
  sample.demo.missions;

WHERE clause #

The WHERE clause is used to specify a condition and only returns records that satisfy the conditional criteria.

There are many astronauts from the U.S. and U.S.S.R, so use the WHERE clause to alter the query and only view the astronauts of other countries.

SELECT
  name,
  nationality,
  mission_title,
  mission_number,
  hours_mission
FROM
  sample.demo.astronauts
WHERE
  nationality NOT LIKE 'U.S.%';

Use the WHERE clause to query the table and only return missions that are classified as a success.

SELECT
  company_name,
  status_rocket,
  cost,
  status_mission
FROM
  sample.demo.missions
WHERE
    status_mission = 'Success';

ORDER BY clause #

The ORDER BY clause is used to sort a result set by one or more output expressions.

Add a sort to the astronauts query to view the results in order of nationality and name.

SELECT
  name,
  nationality,
  mission_title,
  mission_number,
  hours_mission
FROM
  sample.demo.astronauts
WHERE
  nationality NOT LIKE 'U.S.%'
ORDER BY
  nationality,
  name;

A gentle reminder that the astronauts table includes duplicate entries for selected astronauts; however, these astronauts have a unique identifier for each completed mission.

  Sample astronaut query result

Aggregate functions #

Aggregate functions operate on a set of values to compute a single result.

Use one query to find three different aggregate values. Discover the amount of trips, the longest mission in hours, and the smallest mission in hours from select countries.

SELECT
  count() as trips,
  max(hours_mission) as longest_mission,
  min(hours_mission) as shortest_mission
FROM
  sample.demo.astronauts
WHERE
  nationality NOT LIKE 'U.S.%';

Instead of evaluating all the data together, increase the granularity level and evaluate the aggregates specifically for each country.

SELECT
  nationality,
  count() AS number_trips,
  max(hours_mission) AS longest_time,
  min(hours_mission) AS shortest_time
FROM
  sample.demo.astronauts
WHERE
  nationality NOT LIKE 'U.S.%'
GROUP BY
  nationality;

Add an ORDER BY clause and sort the results by most trips per country. In the case of a tie, sort additionally by longest mission.

SELECT
  nationality,
  count() AS number_trips,
  max(hours_mission) AS longest_time,
  min(hours_mission) AS shortest_time
FROM
  sample.demo.astronauts
WHERE
  nationality NOT LIKE 'U.S.%'
GROUP BY
  nationality
ORDER BY
  number_trips DESC,
  longest_time DESC;

Revisit the previous missions query and add an Aggregate function to determine which company spent the most money on successful missions. Order this query by the total highest cost.

SELECT
  company_name,
  sum(cost),
  status_rocket
FROM
  sample.demo.missions
WHERE
  status_mission = 'Success'
GROUP BY
  company_name,
  status_rocket
ORDER BY
  sum(cost) DESC;

Joining tables #

Joins allow you to combine data from multiple tables. A cross join returns all combinations of two relations.

Observing each table separately, it seems like an ambiguous task to join the two together. However, there is a cross-join operation between the missions detail and the astronauts mission_title which counts the number of trips taken by each company in each country. Let’s construct this cross-join.

Observe example detail information from the missions table. The first part of the information is the space shuttle, the second part is the mission.

SELECT
  detail
FROM
  sample.demo.missions
WHERE
  detail LIKE('%Shenzhou 6%')
  OR detail LIKE('%STS-86%');

By matching the second half of the missions detail information with the wildcard mission_title information from the astronauts table, you can determine your cross-join. To create the wildcard comparison, use the format function.

SELECT
  format('%%%s%%', mission_title)
FROM
  sample.demo.astronauts;

Each mission_title is returned enclosed in leading and trailing wildcard characters. Each set of two wildcards '%%' return one wildcard '%', and the '%s' is considered a placeholder. Hence, the result is %mission_title%.

Now, run the cross-join to evaluate the trip number by company and country.

SELECT
  m.company_name,
  a.nationality,
  count() as number_trips
FROM
  sample.demo.astronauts a
  JOIN sample.demo.missions m ON m.detail LIKE format('%%%s%%', a.mission_title)
GROUP BY
  a.nationality,
  m.company_name
ORDER BY
  m.company_name,
  a.nationality;

Next steps #

Check out our other tutorials, or dive right into the SQL documentation and experiment with your own data.