Learn SQL basics#

Welcome to the SQL basics tutorial, which describes useful ways to craft your SQL statements using a sample dataset. The SQL statements start with SELECT expressions that return every row, and increase in complexity with the addition of clauses and operations like WHERE, GROUP BY, and ORDER BY.

Note

The Sample cluster used in the following example is not available in Starburst Enterprise (SEP). For more information on the sample dataset used in this tutorial, see the Starburst Galaxy Sample dataset documentaion.

SELECT statement#

The SELECT clause specifies the output of the query. There are multiple Select expressions that can be utilized depending on the query requirements.

Use the SELECT * expression and query the astronauts table to view the data available for analysis.

SELECT
  *
FROM
  sample.demo.astronauts;

The astronauts table displays all the astronauts that are associated with a mission, which creates duplicate entries of the same astronaut because each row represents not only the astronaut but each mission the astronaut has completed.

Use the SELECT * expression and query the missions table to view the data available for analysis.

SELECT
  *
FROM
  sample.demo.missions;

Instead of using the SELECT * expression to return all the columns in the table, only return columns that are beneficial for your analysis of the astronauts table.

SELECT
  name,
  nationality,
  mission_title,
  mission_number,
  hours_mission
FROM
  sample.demo.astronauts;

Instead of using the SELECT * expression to return all the columns in the table, only return columns that are beneficial for your analysis of the missions table.

SELECT
  company_name,
  status_rocket,
  cost,
  status_mission
FROM
  sample.demo.missions;

WHERE clause#

The WHERE clause is used to specify a condition and only returns records that satisfy the conditional criteria.

There are many astronauts from the U.S. and U.S.S.R, so use the WHERE clause to alter the query and only view the astronauts of other countries.

SELECT
  name,
  nationality,
  mission_title,
  mission_number,
  hours_mission
FROM
  sample.demo.astronauts
WHERE
  nationality NOT LIKE 'U.S.%';

Use the WHERE clause to query the table and only return missions that are classified as a success.

SELECT
  company_name,
  status_rocket,
  cost,
  status_mission
FROM
  sample.demo.missions
WHERE
    status_mission = 'Success';

ORDER BY clause#

The ORDER BY clause is used to sort a result set by one or more output expressions.

Add a sort to the astronauts query to view the results in order of nationality and name.

SELECT
  name,
  nationality,
  mission_title,
  mission_number,
  hours_mission
FROM
  sample.demo.astronauts
WHERE
  nationality NOT LIKE 'U.S.%'
ORDER BY
  nationality,
  name;

Note

A gentle reminder that the astronauts table includes duplicate entries for selected astronauts; however, these astronauts have a unique identifier for each completed mission.

Aggregate functions#

Aggregate functions operate on a set of values to compute a single result.

Use one query to find three different aggregate values. Discover the amount of trips, the longest mission in hours, and the smallest mission in hours from select countries.

SELECT
  count() as trips,
  max(hours_mission) as longest_mission,
  min(hours_mission) as shortest_mission
FROM
  sample.demo.astronauts
WHERE
  nationality NOT LIKE 'U.S.%';

Instead of evaluating all the data together, increase the granularity level and evaluate the aggregates specifically for each country.

SELECT
  nationality,
  count() AS number_trips,
  max(hours_mission) AS longest_time,
  min(hours_mission) AS shortest_time
FROM
  sample.demo.astronauts
WHERE
  nationality NOT LIKE 'U.S.%'
GROUP BY
  nationality;

Add an ORDER BY clause and sort the results by most trips per country. In the case of a tie, sort additionally by longest mission.

SELECT
  nationality,
  count() AS number_trips,
  max(hours_mission) AS longest_time,
  min(hours_mission) AS shortest_time
FROM
  sample.demo.astronauts
WHERE
  nationality NOT LIKE 'U.S.%'
GROUP BY
  nationality
ORDER BY
  number_trips DESC,
  longest_time DESC;

Revisit the previous missions query and add an Aggregate functions to determine which company spent the most money on successful missions. Order this query by the total highest cost.

SELECT
  company_name,
  sum(cost),
  status_rocket
FROM
  sample.demo.missions
WHERE
  status_mission = 'Success'
GROUP BY
  company_name,
  status_rocket
ORDER BY
  sum(cost) DESC;

Other useful resources#

Next steps#

Check out our other SQL tutorials, or dive right into the SQL documentation and experiment with your own data.