Atlas integration#

Starburst Enterprise platform (SEP) features integration with Apache Atlas, a framework for the governance of data and metadata assets. This allows you to include changes to SEP catalogs, schemas, tables, columns, and queries as part of an overall enterprise data governance plan.

Introduction#

The Atlas support in SEP is implemented as an event listener that detects changes to SEP objects and sends notice of those changes to an Atlas server by means of a Kafka message bus. Starburst also provides the atlas-cli command, which allows you to manage the relationship between your SEP cluster and Atlas.

Most home-grown and commercial data governance systems can import from and export to Apache Atlas. This means that enterprises using a non-Atlas data governance system can still take advantage of SEP’s Atlas support by using it as a bridge to their system.

Setup steps#

To integrate Atlas with your SEP cluster, follow the sections in numbered order.

Setup summary#

Set up Atlas support for a SEP cluster with the following steps:

  1. The requirements must be in place before you begin.

  2. Configure an Atlas plugin on your coordinator.

  3. Register Atlas types for SEP objects with the atlas-cli command.

  4. Register your SEP cluster on Atlas with atlas-cli.

  5. Load catalogs and their components onto Atlas with atlas-cli.

  6. Restart your cluster and verify Atlas connectivity.

1. Requirements#

SEP’s support for Apache Atlas requires:

  • SEP cluster version 356 or later, configured and running.

  • Apache Atlas 2.1.0 or later, configured and running.

  • Apache Kafka, configured to consume and emit Atlas messages.

    • You must be able to contact the Atlas and Kafka servers at their specified ports from the SEP coordinator.

  • Atlas CLI downloaded from Starburst Support then installed and configured.

  • A valid Starburst Enterprise license for the Starburst Atlas plugin.

2. Configure Atlas plugin#

Follow the guidance for the Starburst Atlas plugin to create a configuration file that defines the properties of your cluster’s connection to Atlas and Kafka.

After preparing this configuration, do not restart your cluster yet! Wait for step 6 before you restart.

3. Register SEP types#

The atlas-cli command keeps an internal registry of eight Atlas-format types that describe SEP objects. Run the following command to upload these SEP-specific definitions to Atlas.

atlas-cli types create --server https://atlas.example.com:21000 --user=admin --password

See the Atlas CLI reference for this command.

4. Register SEP cluster#

One of the properties you configure for your cluster in step 2 is atlas.cluster.name, where you assign an arbitrary name for your SEP cluster. Use a command like the following to register this cluster name with Atlas.

atlas-cli cluster register --server https://atlas.example.com:21000 \
  --cluster-name fastqueries --user admin --password

The value of the cluster-name parameter here must match the atlas.cluster.name property already configured.

See the Atlas CLI reference for this command.

5. Load catalogs on Atlas#

You must tell Atlas what SEP catalogs and/or schemas and tables you want tracked. This step loads the object names to be tracked. Thereafter, if there are any changes in these objects, the Starburst Atlas plugin running on your SEP cluster detects those changes and notifies Atlas.

“Change” here refers to a change in structure, such as a new column added to a table, or a table deleted from a schema. SEP does not store data, so it is not the job of the Atlas plugin to track changes in table data.

For each catalog on your SEP cluster whose objects you want to track in Atlas, use an atlas-cli command with catalog register command. For example:

atlas-cli catalog register --server https://atlas.example.com:21000 \
  --cluster-name fastqueries --user admin --password \
  --starburst-jdbc-url "jdbc:trino://cluster.example.com:8080?user=starburst_service" \
  --catalog tpch --schema tiny --table nation

See the Atlas CLI reference for further options.

6. Restart cluster and test#

When all SEP cluster objects are registered in Atlas, restart your cluster.

Test the Atlas integration by browsing with the Atlas web interface. Create a new table and register that table with Atlas. Then add a column to that table and make sure the change is reflected in Atlas.

Limitations#

SEP’s support for Apache Atlas has the following limitations:

  • Once a cluster or catalog is registered on an Atlas server, it cannot be unregistered.

  • There is no attempt to de-duplicate tables. For example, on a cluster connected to other SEP clusters by means of the Starburst Stargate connector, it is possible for the same table’s structure metadata to be loaded twice, from a local catalog and from a remote catalog.