Tutorial: Learn SQL window functions #

Welcome to the Tutorial: Learn SQL window functions tutorial, where you explore useful ways to craft your SQL statements using the Sample dataset. Using Window functions, you perform calculations within each SQL statement.

Requirements #

To complete the tutorial using Starburst Galaxy, you need to create a free trial or activate your user account.

Once you login to Starburst Galaxy, the sample cluster and sample catalog are pre-configured and ready for querying.

  • If you do not see the sample catalog automatically available, create a Sample dataset catalog with the name sample.
  • If you do not see the sample cluster automatically available, create a cluster named sample and add the sample catalog.

In the query editor, navigate to the Cluster explorer.

  1. Select the sample cluster dropdown in the left hand navigation to view the catalogs.
  2. Select the sample catalog dropdown.

      Query editor sample cluster explorer

Select the cluster sample and the catalog sample in the top right corner in order to run the queries without specifying the catalog and the schema before each table.

  Query editor use sample catalog button

The namespace for a table is typically specified as <catalog_name>.<schema_name>.<table_name>. For the pre-defined sample dataset, this configuration is as follows:

sample.demo.<table_name>

If you choose to name your catalog differently, either adjust the queries accordingly or select the appropriate catalog in the top right corner.

If you choose to name your cluster differently, select the appropriate cluster in the top right corner.

Window functions #

Window functions perform calculations across rows of the query result. Invoking a window function requires special syntax using the OVER clause to specify the window. Each window function in this tutorial has two subclauses:

  • PARTITION BY: the criteria records must satisfy to belong in the window frame

  • ORDER BY: the order of records in the window frame

First, look at a subset of the missions competed by select astronauts. Compare their mission hours to the average mission hours for all astronauts.

SELECT
    name,
    year_of_mission,
    hours_mission,
    avg(hours_mission) OVER()
    AS avg_all_hours_mission
FROM
    sample.demo.astronauts
WHERE
    name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
    name,
    year_of_mission;

The OVER() clause is creating a window of all rows to get the average across all missions. It is usually unlikely in common SQL practice to create a window that equates to all rows.

Add the partition to calculate the total mission hours of each astronaut.

SELECT
    name,
    year_of_mission,
    hours_mission,
    sum(hours_mission) OVER (PARTITION BY name) AS total_mission_hours
FROM
    sample.demo.astronauts
WHERE
    name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
    name,
    year_of_mission;

Add a second window function to evaluate each astronaut’s mission hours against the average of only their missions.

SELECT
    name,
    year_of_mission,
    hours_mission,
    avg(hours_mission) OVER (PARTITION BY name) AS avg_hours_mission,
    sum(hours_mission) OVER (PARTITION BY name) AS total_mission_hours
FROM
    sample.demo.astronauts
WHERE
    name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
    name,
    year_of_mission;

Determine the percentage of each mission’s hours against each astronaut’s total mission hours.

SELECT
    name,
    year_of_mission,
    hours_mission,
    round(
        hours_mission / sum(hours_mission) OVER (PARTITION BY name) * 100.0, 2
    )
    AS percent_of_total,
    sum(hours_mission) OVER (PARTITION BY name) AS total_mission_hours
FROM
    sample.demo.astronauts
WHERE
    name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
    name,
    year_of_mission;

While the astronauts table already lists the mission number for each astronaut, calculate this value with a window function ranking the mission years.

SELECT
    name,
    year_of_mission,
    hours_mission,
    rank() OVER (
        PARTITION BY name
        ORDER BY
            year_of_mission
    ) AS mission_number
FROM
    sample.demo.astronauts
WHERE
    name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
    name,
    year_of_mission;

Calculate the running total number of hours for each astronaut as they complete each mission. The default window frame for Starburst Galaxy is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which calculates the window frame as all rows before the current row and up to the current row. This is the correct window frame for your use case, so no window frame needs to be explicitly specified in the query.

SELECT
    name,
    year_of_mission,
    hours_mission,
    sum(hours_mission) OVER (
        PARTITION BY name
        ORDER BY
            year_of_mission
    ) AS running_total_mission_hours
FROM
    sample.demo.astronauts
WHERE
    name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
    name,
    year_of_mission;

Next steps #

Check out our other tutorials, or dive right into the SQL documentation and experiment with your own data.