Starburst for data engineers #
Starburst Enterprise platform (SEP) is a fast, interactive distributed SQL query engine that decouples compute from data storage. SEP lets you query data where it lives, including Hive, Snowflake, MySQL or even proprietary data stores. A single SEP query can combine data from all these data sources and more.
Starburst can greatly reduce reliance expensive, complex and often brittle ETL frameworks and their pipelines. Because it uses data instead of disk to execute queries across the cluster, it’s also fast. SEP can pull your landing times forward, and help you meet or beat your SLAs.
How does this work? #
SEP comes with 30+ supported enterprise connectors including exclusive connectors not available in open source, providing high performance SQL-based access to most of the data platforms in your organization - such as Teradata, Postgres, and Hive. Each data platform is defined as a SEP catalog. Catalogs, in turn, define schemas and their tables. Catalogs also, at a minimum, define the connector that SEP uses to connect to that data source:
connector.name=sqlserver connection-url=jdbc:sqlserver://<host>:<port>;database=<database> connection-user=root connection-user=secret
Once you have your connection established, many connectors have configuration properties that help you tune the connector’s performance, such as for timeouts, retries and connection limits.
SEP uses an ANSI-compliant SQL that should feel comfortable and familiar. SEP takes care of translating your queries to the correct SQL syntax for your data source. If you are migrating from Hive, we have a migration guide in our documentation.
How do I get started? #
If you do already have SEP, as a data engineer, you need the names and access credentials to the cluster’s coordinator and worker nodes, as well as our JDBC or ODBC driver. We also have a handy CLI for you to use.
For your next stop, see our data engineer’s user guide.