Introduction to query federation #
Starburst Enterprise is the world’s fastest distributed SQL query engine. It lets data consumers query anything, anywhere, and get the data they need in a single query, no matter where it lives. This idea of combining data from disparate sources is called query federation. It allows you to combine, for instance, historical data from HDFS or objects stores with the most recent incoming data from Kafka one query.
Federating data is simple. You just need to use the fully-qualified name of the tables in your FROM clause. Table names are fully-qualified when they include the catalog name:
A catalog defines the schemas in a data source such as Snowflake, Oracle and Hive.
Here’s an example of data from two different sources, Hive and MySQL, federated into a single query:
SELECT sfm.account_number FROM hive_sales.order_entries.orders oeo JOIN mysql_crm.sf_history.customer_master sfm ON sfm.account_number = oeo.customer_id WHERE sfm.sf_industry = `medical` AND oeo.order_total > 300 LIMIT 2;
This query uses data from the following sources:
orderstable in the
order_entriesschema defined in the
customer_mastertable in the
sf_historyschema defined in the
To help you learn how Starburst uses federated queries in popular analytics tools, here is a handy walk-through of federating queries for rapid visualization in Looker with Starburst Enterprise.