Learn complex grouping with SQL#

Welcome to the complex grouping with SQL tutorial, which describes useful ways to craft your SQL statements using the TPC-H connector. You can perform multi-column analysis within each SQL statement using complex grouping operations.

Requirements#

This tutorial requires you to have a SEP cluster with a TPCH catalog configured.

Complex grouping operations#

SEP supports complex grouping operations using the ROLLUP, CUBE, and GROUPING SETS syntax. This syntax allows you to perform analysis that requires aggregation on multiple sets of columns in a single query.

ROLLUP#

The ROLLUP operator generates all possible subtotals for a given set of columns.

Query the customers table and view the data available for analysis.

SELECT
  *
FROM
  tpch.tiny.customer;

Group the customers by market segment to get the total count in each segment.

SELECT
  mktsegment,
  count() AS customers
FROM
  tpch.tiny.customer
GROUP BY
  mktsegment
ORDER BY
  mktsegment;

Instead of using the GROUP BY clause to organize the market segments, use the ROLLUP operator. Also, reduce the market segments to focus only on three distinct categories: AUTOMOBILE, BUILDING, and MACHINERY.

SELECT
  mktsegment,
  count() AS customers
FROM
  tpch.tiny.customer
WHERE
  mktsegment IN ('AUTOMOBILE', 'BUILDING', 'MACHINERY')
GROUP BY
  ROLLUP(mktsegment)
ORDER BY
  mktsegment;

Add another column to the ROLLUP operator and include the nationkey grouping to generate multiple grouping sets.

SELECT
  mktsegment,
  nationkey,
  count() AS orders
FROM
  tpch.tiny.customer
WHERE
  mktsegment IN ('AUTOMOBILE', 'BUILDING', 'MACHINERY')
  AND nationkey BETWEEN 1 AND 2
GROUP BY
  ROLLUP(mktsegment, nationkey)
ORDER BY
  mktsegment,
  nationkey;

CUBE#

The CUBE operator generates all the possible grouping sets for a given set of columns.

Switch to the CUBE operator to obtain the order count breakdown of each nation key for the market segments not explicitly specified.

SELECT
  mktsegment,
  nationkey,
  count() AS orders
FROM
  tpch.tiny.customer
WHERE
  mktsegment IN ('AUTOMOBILE', 'BUILDING', 'MACHINERY')
  AND nationkey BETWEEN 1 AND 2
GROUP BY
  CUBE(mktsegment, nationkey)
ORDER BY
  mktsegment,
  nationkey;

GROUPING SETS#

The result of the CUBE query can also be achieved using GROUPING SETS. Grouping sets allow you to specify multiple lists of columns to group on, so the manually specifying all the combination of columns can equate to the same output as the CUBE operator.

SELECT
  mktsegment,
  nationkey,
  count() AS orders
FROM
  tpch.tiny.customer
WHERE
  mktsegment IN ('AUTOMOBILE', 'BUILDING', 'MACHINERY')
  AND nationkey BETWEEN 1 AND 2
GROUP BY
  GROUPING SETS (
     (mktsegment),
     (nationkey),
     (mktsegment, nationkey),
     ()
  )
ORDER BY
  mktsegment,
  nationkey;

Next steps#

Check out our other SQL tutorials, or dive right into the SQL documentation and experiment with your own data.