Welcome to the Learn SQL window functions tutorial, where you explore useful ways to craft your SQL statements using the Sample dataset. Using Window functions, you perform calculations within each SQL statement.
To complete the tutorial using Starburst Galaxy, you must create a free trial or activate your user account.
Once you login to Starburst Galaxy, a cluster named sample
or free-cluster
,
containing a catalog named sample
, is usually pre-configured and ready
for querying.
sample
catalog automatically available, create a
Sample dataset catalog with the name sample
.free-cluster
or sample
cluster automatically
available, create a cluster named sample
and
add the sample
catalog.In the query editor, navigate to the Cluster explorer.
free-cluster
or sample
cluster to view
its catalogs.Select the sample
catalog.
In the location drop-down
menus,
select the cluster and the catalog sample
in order to run the queries without
having to specify the full table path location in each query.
The namespace for a table is typically specified as
catalog_name.schema_name.table_name
. For the predefined sample dataset,
this configuration is as follows:
sample.demo.<table_name>
If you choose to name your catalog differently, either adjust the queries in the tutorials accordingly or select the appropriate catalog in the location drop-down menus.
If you choose to name your cluster differently, select the appropriate cluster in the location drop-down menus.
Window functions perform calculations
across rows of the query result. Invoking a window function requires special
syntax using the OVER
clause to specify the window. Each window function in
this tutorial has two subclauses:
PARTITION BY
: the criteria records must satisfy to belong in the
window frame
ORDER BY
: the order of records in the window
frame
First, look at a subset of the missions competed by select astronauts. Compare their mission hours to the average mission hours for all astronauts.
SELECT
name,
year_of_mission,
hours_mission,
avg(hours_mission) OVER()
AS avg_all_hours_mission
FROM
sample.demo.astronauts
WHERE
name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
name,
year_of_mission;
The OVER()
clause is creating a window of all rows to get the average across
all missions. It is usually unlikely in common SQL practice to create a window
that equates to all rows.
Add the partition to calculate the total mission hours of each astronaut.
SELECT
name,
year_of_mission,
hours_mission,
sum(hours_mission) OVER (PARTITION BY name) AS total_mission_hours
FROM
sample.demo.astronauts
WHERE
name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
name,
year_of_mission;
Add a second window function to evaluate each astronaut’s mission hours against the average of only their missions.
SELECT
name,
year_of_mission,
hours_mission,
avg(hours_mission) OVER (PARTITION BY name) AS avg_hours_mission,
sum(hours_mission) OVER (PARTITION BY name) AS total_mission_hours
FROM
sample.demo.astronauts
WHERE
name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
name,
year_of_mission;
Determine the percentage of each mission’s hours against each astronaut’s total mission hours.
SELECT
name,
year_of_mission,
hours_mission,
round(
hours_mission / sum(hours_mission) OVER (PARTITION BY name) * 100.0, 2
)
AS percent_of_total,
sum(hours_mission) OVER (PARTITION BY name) AS total_mission_hours
FROM
sample.demo.astronauts
WHERE
name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
name,
year_of_mission;
While the astronauts table already lists the mission number for each astronaut, calculate this value with a window function ranking the mission years.
SELECT
name,
year_of_mission,
hours_mission,
rank() OVER (
PARTITION BY name
ORDER BY
year_of_mission
) AS mission_number
FROM
sample.demo.astronauts
WHERE
name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
name,
year_of_mission;
Calculate the running total number of hours for each astronaut as they complete
each mission. The default window
frame for
Starburst Galaxy is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
, which
calculates the window frame as all rows before the current row and up to the
current row. This is the correct window frame for your use case, so no window
frame needs to be explicitly specified in the query.
SELECT
name,
year_of_mission,
hours_mission,
sum(hours_mission) OVER (
PARTITION BY name
ORDER BY
year_of_mission
) AS running_total_mission_hours
FROM
sample.demo.astronauts
WHERE
name IN ('Nicollier, Claude', 'Ross, Jerry L.')
ORDER BY
name,
year_of_mission;
Check out our other tutorials, or dive right into the SQL documentation and experiment with your own data.
Is the information on this page helpful?
Yes
No