dbt #

dbt is a data transformation workflow development framework that lets teams quickly and collaboratively deploy analytics code. Starburst supports dbt CLI, but not dbt Cloud.

The dbt-trino adapter supports Starburst Galaxy, Starburst Enterprise platform (SEP), and Trino.

Client requirements #

To run dbt and connect to clusters, you must have:

  • Python 3.6 or later (or the PyPy equivalent)
  • The Starburst-provided dbt-trino adapter
  • dbt-core installed from pip, if not already installed by dbt-trino
  • At least one Trino entry in the dbt profiles.yml file

Cluster requirements #

Dbt allows you to script the execution of SQL statements. Typically these are data transformation workflows that include creation of new objects, such as tables or views. The target catalog in Starburst Galaxy or SEP must support the desired object creations and modifications, including necessary access rights for the configured user.

An example for a suitable catalog is an object storage catalog that uses the Hive connector and a metastore service (HMS) with the following settings:

hive.metastore-cache-ttl=0s
hive.metastore-refresh-interval=5s
hive.allow-drop-table=true
hive.allow-rename-table=true

The username configured to log into the cluster must be granted permission to create and drop tables.

See the GitHub README for the dbt-trino project for further setup options.

The dbt-trino project supports the Trino authentication methods shown in the following table, along with the dbt options to configure in the dbt/profiles.yml file.

Authentication type Configuration options
No authentication
LDAP user and password
Kerberos user
JWT jwt_token
Certificate client_certificate and client_private_key
OAuth

Installation overview #

The following steps gather in one place the instructions from several sources throughout dbt and dbt-trino documentation.

  1. Optional: Use a Python virtual environment for working with dbt. The following commands assume you are using virtualenv.

    python3 -m venv dbt-trino-env
    dbt-trino-env/bin/activate
    

    See Pipenv and Virtual Environments for further information.

  2. Use pip (or pip3) to install the dbt-trino adapter. This also installs the base dbt-core application.

    pip install dbt-trino
    
  3. The following command creates a directory with the arbitrary name mydatapipeline in the current location, and creates ~/.dbt/profiles.yml as a starting point profile file ready for the trino adapter type.

    dbt init mydatapipeline --adapter trino
    

    This process creates the required directory structure and files including dbt_project.yml.

  4. Edit the ~/.dbt/profiles.yml file to specify connection information for your cluster, using links in the Installation resources as references. For example:

    default:
    outputs:
    
      dev:
        type: trino
        method: ldap  # optional, one of {none | ldap | kerberos | jwt | certificate}
        user: [dev_user]
        password: [password]  # required if method is ldap or kerberos
        host: devcluster.example.com
        port: 443
        database: [database name]
        schema: [dev_schema]
        threads: 1  # number of simultaneously building models
        http_scheme: https  # or http
        session_properties:
          query_max_run_time: 4h
          exchange_compression: True
    
      prod:
        type: trino
        method: ldap  # optional, one of {none | ldap | kerberos | jwt | certificate}
        user: [prod_user]
        password: [prod_password]  # required if method is ldap or kerberos
        host: prodcluster.example.com
        port: 443
        database: [database name]
        schema: [prod_schema]
        threads: 1  # number of simultaneously building models
        http_scheme: https  # or http
        http_scheme: [http or https]
        session_properties:
          query_max_run_time: 4h
          exchange_compression: True
    
    target: dev
    

    Galaxy does not support a query_max_run_time of more than four hours. SEP and Trino can be configured to support longer durations.

  5. You can now run dbt commands such as:

    dbt test
    dbt run
    

    See the dbt documentation for further information about running dbt.

  6. When done, deactivate your virtual environment:

    deactivate
    

    To re-use the same settings, reactivate the same environment before running dbt commands again.

Installation resources #