dbt #

dbt is a data transformation workflow development framework that lets teams quickly and collaboratively deploy analytics code. Starburst supports dbt CLI, but not dbt Cloud.

The dbt-trino adapter supports Trino, Starburst Enterprise platform (SEP), and Starburst Galaxy.

Client requirements #

To run dbt and connect to clusters, you must have:

  • Python 3.6 or later (or the PyPy equivalent)
  • The Starburst-provided dbt-trino adapter
  • dbt-core installed from pip, if not already installed by dbt-trino
  • At least one Trino entry in the dbt profiles.yml file

Cluster requirements #

Dbt allows you to script the execution of SQL statements. Typically these are data transformation workflows that include creation of new objects, such as tables, or views. The target catalog in SEP or Starburst Galaxy must support the desired object creations and modifications, including necessary access rights for the configured user.

An example for a suitable catalog is an object storage catalog that uses the Hive connector and a © metastore service (HMS) with the following settings:

hive.metastore-cache-ttl=0s
hive.metastore-refresh-interval = 5s
hive.allow-drop-table=true
hive.allow-rename-table=true

The username configured to log into the cluster must be granted permission to create and drop tables.

See the GitHub README for the dbt-trino project for further setup options.

Installation overview #

The following steps gather in one place the instructions from several sources throughout dbt and dbt-trino documentation.

  1. Optional: Use a Python virtual environment for working with dbt. The following commands assume you are using virtualenv.

    python3 -m venv dbt-trino-env
    dbt-trino-env/bin/activate
    

    See Pipenv and Virtual Environments for further information.

  2. Use pip (or pip3) to install the dbt-trino adapter. This also installs the base dbt-core application.

    pip install dbt-trino
    
  3. The following command creates a directory with the arbitrary name mydatapipeline in the current location, and creates ~/.dbt/profiles.yml as a starting point profile file ready for the trino adapter type.

    dbt init mydatapipeline --adapter trino
    

    This process creates the required directory structure and files including dbt_project.yml.

  4. Edit the ~/.dbt/profiles.yml file to specify connection information for your cluster, using links in the Installation resources as references. For example:

    default:
    outputs:
    
      dev:
        type: trino
        method: ldap  # optional, one of {none | ldap | kerberos}
        user: [dev_user]
        password: [password]  # required if method is ldap or kerberos
        host: devcluster.example.com
        port: 443
        database: [database name]
        schema: [dev_schema]
        threads: 1  # number of simultaneously building models
        http_scheme: https  # or http
        session_properties:
          query_max_run_time: 5d
          exchange_compression: True
    
      prod:
        type: trino
        method: ldap  # optional, one of {none | ldap | kerberos}
        user: [prod_user]
        password: [prod_password]  # required if method is ldap or kerberos
        host: prodcluster.example.com
        port: 443
        database: [database name]
        schema: [prod_schema]
        threads: 1  # number of simultaneously building models
        http_scheme: https  # or http
        http_scheme: [http or https]
        session_properties:
          query_max_run_time: 5d
          exchange_compression: True
    
    target: dev
    
  5. You can now run dbt commands such as:

    dbt test
    dbt run
    

    See the dbt documentation for further information about running dbt.

  6. When done, deactivate your virtual environment:

    deactivate
    

    To re-use the same settings, reactivate the same environment before running dbt commands again.

Installation resources #