Query multiple sources #

Starburst Enterprise and Starburst Galaxy let data consumers query anything, anywhere, and get the data they need in a single query, no matter where it lives in your company’s data mesh. Specifically, they support queries that combine data from many different data sources in the data mesh at the same time. These data sources are defined as catalogs which expose data in schemas and tables.

You can combine, for instance, historical data from HDFS or object stores with the most recent incoming data from PostgreSQL in one query.

Combining data is straightforward: use the fully-qualified name of the tables in your FROM clause. Table names are fully-qualified when they include the catalog and schema name:

<catalog>.<schema>.<table>

A catalog defines the schemas in a data source such as Snowflake, Oracle and Hive.

Here’s an example of data from two different sources, Hive and MySQL, combined into a single query:

SELECT
    sfm.account_number
FROM
    hive_sales.order_entries.orders oeo
JOIN
    mysql_crm.sf_history.customer_master sfm
ON sfm.account_number = oeo.customer_id
WHERE sfm.sf_industry = `medical` AND oeo.order_total > 300
LIMIT 2;

This query uses data from the following sources:

  • The orders table in the order_entries schema defined in the hive_sales catalog
  • The customer_master table in the sf_history schema defined in the mysql_crm catalog