Learn SQL window functions#

Welcome to the SQL window functions tutorial, which describes useful ways to craft your SQL statements using a sample dataset. Using Window functions, you perform calculations within each SQL statement.

Note

The Sample cluster used in the following example is not available in Starburst Enterprise (SEP). For more information on the sample dataset used in this tutorial, see the Starburst Galaxy Sample dataset documentaion.

Window functions#

Window functions perform calculations across rows of the query result. Invoking a window function requires special syntax using the OVER clause to specify the window. Each window function in this tutorial has two subclauses:

  • PARTITION BY: the criteria records must satisfy to belong in the window frame

  • ORDER BY: the order of records in the window frame

First, look at a subset of the missions competed by select astronauts. Compare their mission hours to the average mission hours for all astronauts.

SELECT
    name,
    year_of_mission,
    hours_mission,
    avg(hours_mission) OVER()
    AS avg_all_hours_mission
FROM
    sample.demo.astronauts
WHERE
    name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
    name,
    year_of_mission;

The OVER() clause is creating a window of all rows to get the average across all missions. It is usually unlikely in common SQL practice to create a window that equates to all rows.

Add the partition to calculate the total mission hours of each astronaut.

SELECT
    name,
    year_of_mission,
    hours_mission,
    sum(hours_mission) OVER (PARTITION BY name) AS total_mission_hours
FROM
    sample.demo.astronauts
WHERE
    name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
    name,
    year_of_mission;

Add a second window function to evaluate each astronaut’s mission hours against the average of only their missions.

SELECT
    name,
    year_of_mission,
    hours_mission,
    avg(hours_mission) OVER (PARTITION BY name) AS avg_hours_mission,
    sum(hours_mission) OVER (PARTITION BY name) AS total_mission_hours
FROM
    sample.demo.astronauts
WHERE
    name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
    name,
    year_of_mission;

Determine the percentage of each mission’s hours against each astronaut’s total mission hours.

SELECT
    name,
    year_of_mission,
    hours_mission,
    round(
        hours_mission / sum(hours_mission) OVER (PARTITION BY name) * 100.0, 2
    )
    AS percent_of_total,
    sum(hours_mission) OVER (PARTITION BY name) AS total_mission_hours
FROM
    sample.demo.astronauts
WHERE
    name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
    name,
    year_of_mission;

While the astronauts table already lists the mission number for each astronaut, calculate this value with a window function ranking the mission years.

SELECT
    name,
    year_of_mission,
    hours_mission,
    rank() OVER (
        PARTITION BY name
        ORDER BY
            year_of_mission
    ) AS mission_number
FROM
    sample.demo.astronauts
WHERE
    name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
    name,
    year_of_mission;

Calculate the running total number of hours for each astronaut as they complete each mission. The default window frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which calculates the window frame as all rows before the current row and up to the current row. This is the correct window frame for your use case, so no window frame needs to be explicitly specified in the query.

SELECT
    name,
    year_of_mission,
    hours_mission,
    sum(hours_mission) OVER (
        PARTITION BY name
        ORDER BY
            year_of_mission
    ) AS running_total_mission_hours
FROM
    sample.demo.astronauts
WHERE
    name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
    name,
    year_of_mission;

Next steps#

Check out our other SQL tutorials, or dive right into the SQL documentation and experiment with your own data.