A catalog contains the configuration that allows Starburst Galaxy to access a data source.
To query a data source in Galaxy, configure a catalog for it, and include that catalog in a cluster. Once the catalog is defined and used in a cluster, you can query the data source by accessing the catalog and its nested schemas and tables.
Data sources and clusters must be located in the same cloud provider and region to enable optimal performance and avoid unnecessary data transfer costs.
Access to create and manage catalogs is provided through the Catalogs item in the navigation menu.
Starburst Galaxy facilitates access to numerous different data sources. Configuration for object storage systems, data warehouses, relational databases, and other systems varies by cloud and hosting provider. If your data source has secured or locked down network access, you may need to configure its network to admit one or more of Starburst Galaxy’s outgoing IP blocks as shown on the IP allow list.
The following sections provide links to the configuration pages for the data source catalogs supported by Starburst Galaxy.
Starburst Galaxy supports the following object storage systems.
Starburst Warp Speed is available for S3 and Tabular catalogs to improve performance.
Starburst Galaxy supports the following RDBMS and data warehouse catalogs.
Stargate, a link to access data across remote Starburst clusters.
Starburst Galaxy also provides access to a number of full datasets. You can create a catalog for these datasets, and use them for a number of purposes:
The following dataset catalogs are available:
Designed for high performance testing of other components in the same way as
/dev/zero on Unix-like systems.
See the Introductory project tutorials for examples of using this dataset.
Provides data in two tables that represent space mission data.
Provides a set of schemas to support the TPC Benchmark™ DS database, which is a benchmark used to measure the performance of complex decision support databases.
Provides a set of schemas to support the TPC Benchmark™ H database, which is a benchmark used to measure the performance of highly-complex decision support databases.
Is the information on this page helpful?