dbt #
dbt is a data transformation workflow development framework that lets teams quickly and collaboratively deploy analytics code. Starburst supports dbt CLI, but not dbt Cloud.
The dbt-trino adapter supports Starburst Galaxy, Starburst Enterprise platform (SEP), and Trino.
Client requirements #
To run dbt and connect to clusters, you must have:
- Python 3.6 or later (or the PyPy equivalent)
- The Starburst-provided
dbt-trino
adapter dbt-core
installed frompip
, if not already installed bydbt-trino
- At least one Trino entry in the dbt
profiles.yml
file
Cluster requirements #
Dbt allows you to script the execution of SQL statements. Typically these are data transformation workflows that include creation of new objects, such as tables or views. The target catalog in Starburst Galaxy or SEP must support the desired object creations and modifications, including necessary access rights for the configured user.
An example for a suitable catalog is an object storage catalog that uses the Hive connector and a metastore service (HMS) with the following settings:
hive.metastore-cache-ttl=0s
hive.metastore-refresh-interval=5s
hive.allow-drop-table=true
hive.allow-rename-table=true
The username configured to log into the cluster must be granted permission to create and drop tables.
See the GitHub README for
the dbt-trino
project for further setup options.
The dbt-trino
project supports the Trino authentication
methods shown in the following
table, along with the dbt options to configure in the dbt/profiles.yml
file.
Authentication type | Configuration options |
---|---|
No authentication | — |
LDAP | user and password |
Kerberos | user |
JWT | jwt_token |
Certificate | client_certificate and client_private_key |
OAuth | — |
Installation overview #
The following steps gather in one place the instructions from several sources
throughout dbt and dbt-trino
documentation.
-
Optional: Use a Python virtual environment for working with dbt. The following commands assume you are using
virtualenv
.python3 -m venv dbt-trino-env dbt-trino-env/bin/activate
See Pipenv and Virtual Environments for further information.
-
Use
pip
(orpip3
) to install thedbt-trino
adapter. This also installs the basedbt-core
application.pip install dbt-trino
Note: On MacOS, the dbt documentation recommends installing dbt with Homebrew. However, Homebrew-installed dbt and pip-installeddbt-trino
do not go well together. If you already installed dbt with Homebrew, uninstall it and let thedbt-trino
installation managedbt-core
. -
The following command creates a directory with the arbitrary name
mydatapipeline
in the current location, and creates~/.dbt/profiles.yml
as a starting point profile file ready for thetrino
adapter type.dbt init mydatapipeline --adapter trino
This process creates the required directory structure and files including
dbt_project.yml
. -
Edit the
~/.dbt/profiles.yml
file to specify connection information for your cluster, using links in the Installation resources as references. For example:default: outputs: dev: type: trino method: ldap # optional, one of {none | ldap | kerberos | jwt | certificate} user: [dev_user] password: [password] # required if method is ldap or kerberos host: devcluster.example.com port: 443 database: [database name] schema: [dev_schema] threads: 1 # number of simultaneously building models http_scheme: https # or http session_properties: query_max_run_time: 4h exchange_compression: True prod: type: trino method: ldap # optional, one of {none | ldap | kerberos | jwt | certificate} user: [prod_user] password: [prod_password] # required if method is ldap or kerberos host: prodcluster.example.com port: 443 database: [database name] schema: [prod_schema] threads: 1 # number of simultaneously building models http_scheme: https # or http http_scheme: [http or https] session_properties: query_max_run_time: 4h exchange_compression: True target: dev
Galaxy does not support a
query_max_run_time
of more than four hours. SEP and Trino can be configured to support longer durations. -
You can now run dbt commands such as:
dbt test dbt run
See the dbt documentation for further information about running dbt.
-
When done, deactivate your virtual environment:
deactivate
To re-use the same settings, reactivate the same environment before running dbt commands again.
Installation resources #
- Install dbt core and the
dbt-trino
plugin - Information about the
profiles.yml
file: - Alternative dbt CLI installation
Is the information on this page helpful?
Yes
No
Is the information on this page helpful?
Yes
No