A metastore is a database used to catalog the metadata of a data collection. Metastores map files in distributed objects store systems such as S3 and HDFS into tables, and provide metadata such as columns names and type mapping. Metastores also provide data to the cost-based optimizer.
Before you can query data in an object storage account, it is necessary to have a metastore service associated with that object storage.
For more information about object storage and the requirement for a metastore, see Using object storage systems.
Read more about the following S3 specific metastore configurations on the Amazon S3 catalog overview.
You can use AWS Glue to manage the metadata about your object storage. Configure access to AWS Glue with the following parameters:
Read External security in AWS to learn about configuring these details in AWS.
You can use a Hive Metastore Service (HMS) to manage the metadata for your object storage. The HMS must be located in the same cloud provider and region as the object storage itself.
A connection to the HMS can be established directly, if the Starburst Galaxy IP range/CIDR is allowed to connect.
If the HMS is only accessible inside the virtual private cloud (VPC) of the cloud provider, you can use an SSH tunnel with a bastion host in the VPC.
In both cases, configure access with the following parameters:
Starburst Galaxy provides its own metastore service for your convenience. You do not need to configure and manage a separate Hive Metastore Service deployment or equivalent system.
In Metastore configuration, select Starburst Galaxy to set up and use the built-in metastore provided by Galaxy.
For Amazon S3 and Google Cloud Storage, create a bucket in your object storage account, and create a directory in that bucket. Provide that bucket name and directory name. This location is then used to store the metastore data associated with this S3 or GCS account.
For Azure ADLS, create a container in your storage account, and create a directory in that container. Provide this storage container name and directory name. This sets up the location used to store the metadata associated with this storage account.
The meanings of the two Allow controls are the same for a Starburst Galaxy metastore as for a separate Hive Metastore Service, described previously.
Note that deletion of the catalog also results in removal of the associated Starburst Galaxy metastore data.
Is the information on this page helpful?