Starburst Galaxy

  •  Get started

  •  Working with data

  •  Data engineering

  •  Developer tools

  •  Cluster administration

  •  Security and compliance

  •  Troubleshooting

  • Galaxy status

  •  Reference

  • Learn SQL window functions #

    Welcome to the Learn SQL window functions tutorial, where you explore useful ways to craft your SQL statements using the Sample dataset. Using Window functions, you perform calculations within each SQL statement.

    Requirements #

    To complete the tutorial using Starburst Galaxy, you must create a free trial or activate your user account.

    Once you login to Starburst Galaxy, a cluster named sample or free-cluster, containing a catalog named sample, is usually pre-configured and ready for querying.

    • If you do not see the sample catalog automatically available, create a Sample dataset catalog with the name sample.
    • If you do not see the free-cluster or sample cluster automatically available, create a cluster named sample and add the sample catalog.

    In the query editor, navigate to the Cluster explorer.

    1. In the navigation menu, expand the free-cluster or sample cluster to view its catalogs.
    2. Select the sample catalog.

        Query editor sample cluster explorer

    In the location drop-down menus, select the cluster and the catalog sample in order to run the queries without having to specify the full table path location in each query.

      Query editor use sample catalog button

    The namespace for a table is typically specified as catalog_name.schema_name.table_name. For the predefined sample dataset, this configuration is as follows:

    sample.demo.<table_name>
    

    If you choose to name your catalog differently, either adjust the queries in the tutorials accordingly or select the appropriate catalog in the location drop-down menus.

    If you choose to name your cluster differently, select the appropriate cluster in the location drop-down menus.

    Window functions #

    Window functions perform calculations across rows of the query result. Invoking a window function requires special syntax using the OVER clause to specify the window. Each window function in this tutorial has two subclauses:

    • PARTITION BY: the criteria records must satisfy to belong in the window frame

    • ORDER BY: the order of records in the window frame

    First, look at a subset of the missions competed by select astronauts. Compare their mission hours to the average mission hours for all astronauts.

    SELECT
        name,
        year_of_mission,
        hours_mission,
        avg(hours_mission) OVER()
        AS avg_all_hours_mission
    FROM
        sample.demo.astronauts
    WHERE
        name IN ('Nicollier, Claude', 'Ross, Jerry L.')
    ORDER BY
        name,
        year_of_mission;
    

    The OVER() clause is creating a window of all rows to get the average across all missions. It is usually unlikely in common SQL practice to create a window that equates to all rows.

    Add the partition to calculate the total mission hours of each astronaut.

    SELECT
        name,
        year_of_mission,
        hours_mission,
        sum(hours_mission) OVER (PARTITION BY name) AS total_mission_hours
    FROM
        sample.demo.astronauts
    WHERE
        name IN ('Nicollier, Claude', 'Ross, Jerry L.')
    ORDER BY
        name,
        year_of_mission;
    

    Add a second window function to evaluate each astronaut’s mission hours against the average of only their missions.

    SELECT
        name,
        year_of_mission,
        hours_mission,
        avg(hours_mission) OVER (PARTITION BY name) AS avg_hours_mission,
        sum(hours_mission) OVER (PARTITION BY name) AS total_mission_hours
    FROM
        sample.demo.astronauts
    WHERE
        name IN ('Nicollier, Claude', 'Ross, Jerry L.')
    ORDER BY
        name,
        year_of_mission;
    

    Determine the percentage of each mission’s hours against each astronaut’s total mission hours.

    SELECT
        name,
        year_of_mission,
        hours_mission,
        round(
            hours_mission / sum(hours_mission) OVER (PARTITION BY name) * 100.0, 2
        )
        AS percent_of_total,
        sum(hours_mission) OVER (PARTITION BY name) AS total_mission_hours
    FROM
        sample.demo.astronauts
    WHERE
        name IN ('Nicollier, Claude', 'Ross, Jerry L.')
    ORDER BY
        name,
        year_of_mission;
    

    While the astronauts table already lists the mission number for each astronaut, calculate this value with a window function ranking the mission years.

    SELECT
        name,
        year_of_mission,
        hours_mission,
        rank() OVER (
            PARTITION BY name
            ORDER BY
                year_of_mission
        ) AS mission_number
    FROM
        sample.demo.astronauts
    WHERE
        name IN ('Nicollier, Claude', 'Ross, Jerry L.')
    ORDER BY
        name,
        year_of_mission;
    

    Calculate the running total number of hours for each astronaut as they complete each mission. The default window frame for Starburst Galaxy is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which calculates the window frame as all rows before the current row and up to the current row. This is the correct window frame for your use case, so no window frame needs to be explicitly specified in the query.

    SELECT
        name,
        year_of_mission,
        hours_mission,
        sum(hours_mission) OVER (
            PARTITION BY name
            ORDER BY
                year_of_mission
        ) AS running_total_mission_hours
    FROM
        sample.demo.astronauts
    WHERE
        name IN ('Nicollier, Claude', 'Ross, Jerry L.')
    ORDER BY
        name,
        year_of_mission;
    

    Next steps #

    Check out our other tutorials, or dive right into the SQL documentation and experiment with your own data.