Starburst Galaxy

  • Starburst Galaxy Home
  •   Get started
  •   Global features
  • Help center
  • Release notes
  • Feature release types

  • Starburst Galaxy UI
  •   Query
  •   Catalogs
  •   Catalog explorer
  •   Data products
  •   Clusters
  • Partner connect
  •   Admin
  •   Access control
  •   Cloud settings

  • Administration
  •   Security
  •   Single sign-on
  •   Troubleshooting
  • Galaxy status

  • Reference
  •   Python
  • API
  •   SQL
  •   Tutorials
  • Data lineage #

    Data lineage provides the ability to visualize data flow between tables, views, and materialized views for workloads carried out in Starburst Galaxy.

    Data lineage lets data engineers see the full in-Galaxy end-to-end lineage for any table-level entity, allowing you to more effectively plan changes or troubleshoot data issues by being able to evaluate how the data is transformed and moved both upstream and downstream of the source.

    Data lineage also lets data consumers view the provenance of data for their data sets of interest, allowing them to verify that their data is coming from valid and accepted upstream sources.

    Automated lineage creation #

    Data lineage is automatically created when transformation workloads that result in actual or logical data flow are executed by Galaxy. Events that establish lineage include, but are not limited to:

    • CREATE TABLE AS from one or more tables
    • UPDATE
    • INSERT

    Lineage view #

    To access lineage, navigate to a table, view, or materialized view level entity of the catalog explorer and click the Lineage tab.

    view of the lineage tab

    The Lineage tab is only available to roles with the View all data lineage privilege. The privilege is granted to the accountadmin role by default, and must be granted manually to all other roles.

    Lineage graph #

    View and interact with the lineage graph to navigate how data flows to-and-from the table-level entity you have selected. The lineage graph automatically displays upstream and downstream lineage one hop in both directions. If you have selected a table-level entity that does not have any data flow to or from it, only a single node appears.

    The lineage graph has three components that can be interacted with:

    • Entity node: A table-level entity that can be clicked to open a dialog with table-level metadata.
    • Transformation node: A process node that sits on edges between entity nodes that resulted in movements of data. Click the transformation node to display current and historical transformation metadata from the last 30 days, including the SQL statement. Only workloads that have been processed by Galaxy are captured.
    • Expander: The controls that lets you traverse up and down the lineage.

    Lineage side pane #

    View table-level entity and transformation metadata in their respective side panels to understand how your entity’s data has changed over time.

    Entity side pane #

    The entity side pane displays the Galaxy and source descriptions, owner, contact, column, and tag metadata of the entity.

    On upstream or downstream nodes, click View this table to set that node as the currently selected node.

    view of the lineage side pane for an entity

    Transformation side pane #

    The transformation side pane displays transformation metadata of current and historical events from the last 30 days, including the SQL statement executed by Galaxy.

    view of the lineage side pane for a transformation