Data mesh with Starburst #
Starburst Enterprise is the world’s fastest distributed SQL query engine. It lets data consumers query anything, anywhere, and get the data they need in a single query, no matter where it lives. This idea of combining data from disparate sources is called a data mesh. Specifically Starburst Enterprise supports query federation where a query access many different datasources at the same time. These data sources are defined as catalogs and expose data in schemas and tables.
It allows you to combine, for instance, historical data from HDFS or objects stores with the most recent incoming data from Kafka one query.
Federating data is simple. You just need to use the fully-qualified name of the
tables in your
FROM clause. Table names are fully-qualified when they include
the catalog and schema name:
A catalog defines the schemas in a data source such as Snowflake, Oracle and Hive.
Here’s an example of data from two different sources, Hive and MySQL, federated into a single query:
SELECT sfm.account_number FROM hive_sales.order_entries.orders oeo JOIN mysql_crm.sf_history.customer_master sfm ON sfm.account_number = oeo.customer_id WHERE sfm.sf_industry = `medical` AND oeo.order_total > 300 LIMIT 2;
This query uses data from the following sources:
orderstable in the
order_entriesschema defined in the
customer_mastertable in the
sf_historyschema defined in the
To help you learn how Starburst uses federated queries in popular analytics tools, here is a handy walk-through of federating queries for rapid visualization in Looker with Starburst Enterprise.
Is the information on this page helpful?