Starburst Galaxy

  • Starburst Galaxy Home
  • Get started
  • Get support
  •   Global features
  • Release notes
  • Feature release types

  • Starburst Galaxy UI
  •   Query
  •   Catalogs
  •   Catalog explorer
  •   Data products
  •   Clusters
  • Partner connect
  •   Admin
  •   Access control
  •   Cloud settings

  • Administration
  •   Security
  •   Single sign-on
  •   Troubleshooting
  • Galaxy status

  • Reference
  • API
  •   SQL
  •   Tutorials
  • Schema discovery #

    The Schema discovery pane lets you examine the metadata of the specified location. Schema discovery is for catalogs in object storage data sources only.

    If you are running schema discovery for the first time, click Run schema discovery to analyze a root object in an object storage location and return the structure of any discovered tables. If you have previously performed schema discovery for the chosen catalog, click Run discovery:

    1. Enter the URL of the bucket to scan in the Catalog location URL field. The user’s role must have location privileges. If the role does not have location privileges, a dialog appears with the option to add them.
    2. Enter the name of the schema in the Set default schema field.
    3. Optionally, under Advanced settings, select the maximum number of sample tables to preview, the maximum sample file lines, and the maximum files per table.
    4. Click Run discovery.

    catalog explorer schema discovery

    Your new discovery populates a table containing useful information for your discoveries:

    • Source: The source URL for the bucket used for discovery. Click the source to navigate to the discovery results pane.
    • Timestamp The timestamp for when the discovery was executed.
    • Status: The current status of the discovery, such as when the discovery was completed, or if the discovery is in progress.
    • Changes: Displays a summary of the changes made during the discovery run, such as the number of tables created.
    • Log: The Log column shows an entry when schema discovery both succeeds and is applied with Create schemas, Create tables, or Update tables. Click an entry in this column to open the log events pane for that event.
    • Rerun: Click Rerun to run schema discovery on the source again. This option performs a diff on the location and returns any changes found.

    Log events #

    The log events pane lets you view a list of log entries for each discovery related event. The Summary dialog gives you the number of successful query executions, and the number of errors that occured during the discovery run.

    The list of log events includes the following information:

    • Status: The outcome of the event. A green checkmark indicates a successful query execution, and a red exclamation mark indicates an error.
    • Timestamp: The timestamp for when the event occured.
    • Query text: The SQL query execution text, such as CREATE TABLE, or CREATE SCHEMA. Click the text to view the full query.
    • Message: A message detailing the log event, such as the successful creation of a schema, or an error message.

    Discovery results #

    The discovery results pane lists tables found from the source during discovery:

    schema discovery results pane

    • Schema: The name of the schema that contains the table.
    • Table name: The name of the table.
    • Format: The table’s file format.
    • Changes: A summary of changes made from the discovery run, such as the number of tables created.
    • Results: Click Preview to see a dialog that describes the columns of the table and its configuration options.

    Click Create all tables to navigate to the log events pane and to see each table being created. You can view your discovered schema in the schemas pane.