Welcome to the Learn complex grouping with SQL tutorial, where you explore useful ways to craft your SQL statements using the TPC-H dataset. You can perform multi-column analysis within each SQL statement using complex grouping operations.
To complete the tutorial using Starburst Galaxy, you must create a free trial or activate your user account.
Once you login to Starburst Galaxy, the sample
cluster and tpch
catalog are
usually pre-configured and ready for querying.
The sample
cluster and tpch
catalog are pre-configured and ready for
querying.
tpch
catalog automatically available, create a TPC-H
dataset catalog named tpch
.sample
cluster automatically available, create a
cluster named sample
and add the tpch
catalog
to it.In the query editor, navigate to the Cluster explorer.
sample
cluster to view its catalogs.Select the sample
catalog.
In the location drop-down
menus,
select the cluster sample
and the catalog tpch
in order to run the queries
without having to specify the full table path location in each query.
The namespace for a table is typically specified as
catalog_name.schema_name.table_name
. For the tpch dataset, this configuration
is as follows:
tpch.tiny.<table_name>
If you choose to name your catalog differently, either adjust the queries accordingly or select the appropriately catalog in the location drop-down menus.
If you choose to name your cluster differently, select the appropriately named cluster in the location drop-down menus.
Starburst Galaxy supports complex grouping
operations
using the ROLLUP
, CUBE
, and GROUPING SETS
syntax. This syntax allows you
to perform analysis that requires aggregation on multiple sets of columns in a
single query.
The ROLLUP operator generates all possible subtotals for a given set of columns.
Query the customers table and view the data available for analysis.
SELECT
*
FROM
tpch.tiny.customer;
Group the customers by market segment to get the total count in each segment.
SELECT
mktsegment,
count() AS customers
FROM
tpch.tiny.customer
GROUP BY
mktsegment
ORDER BY
mktsegment;
Instead of using the GROUP BY
clause to organize the market segments, use the
ROLLUP
operator. Also, reduce the market segments to focus only on three
distinct categories: AUTOMOBILE, BUILDING, and MACHINERY.
SELECT
mktsegment,
count() AS customers
FROM
tpch.tiny.customer
WHERE
mktsegment IN ('AUTOMOBILE', 'BUILDING', 'MACHINERY')
GROUP BY
ROLLUP(mktsegment)
ORDER BY
mktsegment;
Add another column to the ROLLUP
operator and include the nationkey
grouping
to generate multiple grouping sets.
SELECT
mktsegment,
nationkey,
count() AS orders
FROM
tpch.tiny.customer
WHERE
mktsegment IN ('AUTOMOBILE', 'BUILDING', 'MACHINERY')
AND nationkey BETWEEN 1 AND 2
GROUP BY
ROLLUP(mktsegment, nationkey)
ORDER BY
mktsegment,
nationkey;
The CUBE operator generates all the possible grouping sets for a given set of columns.
Switch to the CUBE
operator to obtain the order count breakdown of each nation
key for the market segments not explicitly specified.
SELECT
mktsegment,
nationkey,
count() AS orders
FROM
tpch.tiny.customer
WHERE
mktsegment IN ('AUTOMOBILE', 'BUILDING', 'MACHINERY')
AND nationkey BETWEEN 1 AND 2
GROUP BY
CUBE(mktsegment, nationkey)
ORDER BY
mktsegment,
nationkey;
The result of the CUBE
query can also be achieved using GROUPING SETS
.
Grouping sets allow you to specify
multiple lists of columns to group on, so the manually specifying all the
combination of columns can equate to the same output as the CUBE
operator.
SELECT
mktsegment,
nationkey,
count() AS orders
FROM
tpch.tiny.customer
WHERE
mktsegment IN ('AUTOMOBILE', 'BUILDING', 'MACHINERY')
AND nationkey BETWEEN 1 AND 2
GROUP BY
GROUPING SETS (
(mktsegment),
(nationkey),
(mktsegment, nationkey),
()
)
ORDER BY
mktsegment,
nationkey;
Check out our other tutorials, or dive right into the SQL documentation and experiment with your own data.
Is the information on this page helpful?
Yes
No