Starburst Galaxy

  •  Get started

  •  Working with data

  •  Data engineering

  •  Developer tools

  •  Cluster administration

  •  Security and compliance

  •  Troubleshooting

  • Galaxy status

  •  Reference

  • Metastores #

    A metastore is a database used to catalog the metadata of a data collection. Metastores map files in distributed objects store systems such as S3 and HDFS into tables, and provide metadata such as columns names and type mapping. Metastores also provide data to the cost-based optimizer.

    Before you can query data in an object storage account, it is necessary to have a metastore service associated with that object storage.

    For more information about object storage and the requirement for a metastore, see Using object storage systems.

    Read more about the following S3 specific metastore configurations on the Amazon S3 catalog overview.

    AWS Glue #

    You can use AWS Glue to manage the metadata about your object storage. Configure access to AWS Glue with the following parameters:

    • AWS Glue region
    • Access security with AWS access key for Glue and AWS secret key for Glue or Cross account IAM role

    Read External security in AWS to learn about configuring these details in AWS.

    Hive Metastore Service #

    You can use a Hive Metastore Service (HMS) to manage the metadata for your object storage. The HMS must be located in the same cloud provider and region as the object storage itself.

    A connection to the HMS can be established directly, if the Starburst Galaxy IP range/CIDR is allowed to connect.

    If the HMS is only accessible inside the virtual private cloud (VPC) of the cloud provider, you can use an SSH tunnel with a bastion host in the VPC.

    In both cases, configure access with the following parameters:

    • Hive Metastore host: the fully qualified domain name of the HMS server.
    • Hive Metastore port: the port used by the HMS, typically 9083.
    • Allow creating external tables: switch to indicate whether new tables can be created in the object storage and HMS from Starburst Galaxy with CREATE TABLE or CREATE TABLE AS commands.
    • Allow writing to external tables: switch to indicate whether data management write operations are permitted.

    Starburst Galaxy metastore #

    Starburst Galaxy provides its own metastore service for your convenience. You do not need to configure and manage a separate Hive Metastore Service deployment or equivalent system.

    In Metastore configuration, select Starburst Galaxy to set up and use the built-in metastore provided by Galaxy.

    For Amazon S3 and Google Cloud Storage, create a bucket in your object storage account, and create a directory in that bucket. Provide that bucket name and directory name. This location is then used to store the metastore data associated with this S3 or GCS account.

    For Azure ADLS, create a container in your storage account, and create a directory in that container. Provide this storage container name and directory name. This sets up the location used to store the metadata associated with this storage account.

    The meanings of the two Allow controls are the same for a Starburst Galaxy metastore as for a separate Hive Metastore Service, described previously.

    Note that deletion of the catalog also results in removal of the associated Starburst Galaxy metastore data.