The Teradata connectors allow querying and creating tables in external Teradata databases. There are two connectors available:

The Starburst Teradata connector is a standard connector with a rich feature set, which is easy to install.

The Starburst Teradata Direct connector has the same feature set as the Starburst Teradata connector. It combines the connector with a table operator installed on the Teradata servers. This makes it more complex to install and update, but can produce higher query performance.

The Starburst Teradata connector is a feature-rich, easy to install connector.

Requirements#

To connect to Teradata, you need:

General configuration properties#

The following table describes general catalog configuration properties for the connector:

Property name

Description

Default value

case-insensitive-name-matching

Support case insensitive schema and table names.

false

case-insensitive-name-matching.cache-ttl

1m

case-insensitive-name-matching.config-file

Path to a name mapping configuration file in JSON format that allows Trino to disambiguate between schemas and tables with similar names in different cases.

null

case-insensitive-name-matching.refresh-period

Frequency with which Trino checks the name matching configuration file for changes.

0 (refresh disabled)

Duration for which metadata, including table and column statistics, is cached.

0 (caching disabled)

Cache the fact that metadata, including table and column statistics, is not available

false

Maximum number of objects stored in the metadata cache

10000

write.batch-size

Maximum number of statements in a batched execution. Do not change this setting from the default. Non-default values may negatively impact performance.

1000

Transaction mode#

As a new user, consider setting the Teradata system default to the ANSI session mode. For more information, read Transaction Semantics Differences in ANSI and Teradata Session Modes.

Unless the Teradata system default is changed, the connector defaults to the TERA transaction mode. To set the connector’s transaction mode to ANSI without changing the mode system-wide on Teradata, append /TMODE=ANSI to the connection-url parameter in your catalog properties file:

Installation#

To install the Teradata connector, use the following directions:

1. Obtain the Teradata JDBC driver files (terajdbc4.jar and tdgssconfig.jar) from the Teradata website.

4. Perform the above steps on every SEP cluster node.

5. Restart SEP on every node.

Parallelism#

SEP is reading from Teradata using multiple parallel JDBC connections. By default, the number of parallel reads is determined automatically based on number of AMPs available in Teradata. The parallel reads can be configured by using the following properties:

Property name

Description

Default

Determines the parallelism method. Possible values are:

• AMP_BASED, single JDBC connection per Teradata AMP

• MANUAL, user configures number of parallel connections

AMP_BASED

Number of parallel connections, to be used when teradata.parallelism-type=MANUAL is used. Use 1 to disable parallel read.

Number of AMP in Teradata Database

Mapping data types between SEP and Teradata#

SEP and Teradata each support different data types for table columns and use different names for some of them. Some data types are not supported equally by both systems. The following tables display the mapping used by SEP when working with existing columns, and when creating tables in Teradata.

Mappings for working with existing fields in Teradata:

SEP data type

BYTEINT

TINYINT

SMALLINT

SMALLINT

INTEGER

INTEGER

BIGINT

BIGINT

REAL

REAL

FLOAT

REAL

DOUBLE

DOUBLE

NUMERIC

DECIMAL

DECIMAL

DECIMAL

CHAR

CHAR

NCHAR

CHAR

VARCHAR

VARCHAR

NVARCHAR

VARCHAR

LONGVARCHAR

VARCHAR

LONGNVARCHAR

VARCHAR

DATE

DATE

TIMESTAMP

TIMESTAMP

TIMESTAMP WITH TIME ZONE

not supported

JSON

not supported

XML

not supported

Any Teradata data types not listed above, such as JSON or XML and others, are not supported.

If a Teradata column uses a type not listed in the above table, any query accessing such a field fails, unless unsupported-type.handling-strategy is set to CONVERT_TO_VARCHAR as detailed in Mapping configuration.

Mappings for creating tables with specific fields in Teradata:

SEP data type

TINYINT

BYTEINT

SMALLINT

SMALLINT

INTEGER

INTEGER

BIGINT

BIGINT

REAL

REAL

DOUBLE

DOUBLE PRECISION

DATE

DATE

TIMESTAMP

TIMESTAMP

CHAR

CHAR

VARCHAR

VARCHAR

The mapping above applies to creating a table with CREATE TABLE and CREATE TABLE ... AS [SELECT] statements.

Any SEP data type not listed in the above table can not be created as a column in a new Teradata table.

General configuration properties#

The following properties can be used to configure how data types from the connected data source are mapped to Trino data types and how the metadata is cached in Trino.

Property name

Description

Default value

unsupported-type-handling

Configure how unsupported column data types are handled:

• IGNORE, column is not accessible.

• CONVERT_TO_VARCHAR, column is converted to unbounded VARCHAR.

The respective catalog session property is unsupported_type_handling.

IGNORE

jdbc-types-mapped-to-varchar

Allow forced mapping of comma separated lists of data types to convert to unbounded VARCHAR

Mapping configuration#

Mapping configuration properties#

Property name

Session property name

Description

Default

number_default_scale

Default SEP DECIMAL scale for Teradata NUMBER (without precision and scale) date type. When not set then such column is treated as not supported.

0

number_rounding_mode

Rounding mode for the Teradata NUMBER data type. Possible values are:

• UNNECESSARY - Rounding mode to assert that the requested operation has an exact result, hence no rounding is necessary.

• CEILING - Rounding mode to round towards positive infinity.

• FLOOR - Rounding mode to round towards negative infinity.

• HALF_DOWN - Rounding mode to round towards nearest neighbor unless both neighbors are equidistant, in which case rounding down is used.

• HALF_EVEN - Rounding mode to round towards the nearest neighbor unless both neighbors are equidistant, in which case rounding towards the nearest neighbor is performed.

• HALF_UP - Rounding mode to round towards nearest neighbor unless both neighbors are equidistant, in which case rounding up is used.

• UP - Rounding mode to round towards zero.

• DOWN - Rounding mode to round towards zero.

UNNECESSARY

Lower case of Teradata NOT CASESPECIFIC char types#

SEP does not support case insensitive comparison of values with char-based type. However, it is possible to force the connector to convert values to lower case for these comparisons. You can activate this behavior with a catalog property:

Alternatively you can use a catalog session property:

Pass-through queries#

The Teradata connector allows you to embed any valid Teradata query. You need to enable this feature in the catalog properties file with the following configuration:

The Teradata query string you want to pass through has to be base32-encoded to avoid having to deal with escaping quotes and case sensitivity issues in table identifiers.

For example, you can encode the following simple Teradata query. It uses the TOP statement, which is not supported in ANSI SQL and SEP:

SELECT TOP 1 * FROM nation ORDER BY nationkey;

Copy the statement into a file query.txt, and encode the text on the command line with base32:

$base32 query.txt KNCUYRKDKQQFIT2QEAYSAKRAIZJE6TJANZQXI2LPNYQE6USEIVJCAQSZEBXGC5DJN5XGWZLZHMFA The command is typically available in a coreutils package. Now it can be used in a query. The following example returns all resulting rows (*) from the query that runs in the any schema in the td catalog: SELECT * FROM td.any."$query:KNCUYRKDKQQFIT2QEAYSAKRAIZJE6TJANZQXI2LPNYQE6USEIVJCAQSZEBXGC5DJN5XGWZLZHMFA"

CREATE TABLE WITH syntax#

The Teradata connector supports CREATE TABLE [ WITH ( property_name = expression [, ...] ) ] trino syntax.

Create table properties#

Property name

Description

primary_index

This allows to configure primary index for the table. The value is the array of columns used for the index. NOTE: If provided array is empty the Teradata behaviour is equal to NO PRIMARY INDEX and default index is not created. Defaults to ARRAY[].

primary_index_unique

Specifies if created primary index is unique. Defaults to false.

Performance#

The connector includes a number of performance improvements, detailed in the following sections.

Table statistics#

The Teradata connector supports table and column statistics to improve query processing performance based on the actual data in the data source.

The statistics are collected by Teradata and retrieved by the connector.

To collect statistics for a table, execute the following statements in Teradata. The second statement needs to be repeated for every column you want to collect statistics for.

COLLECT SUMMARY STATISTICS ON table_schema.table_name;
COLLECT STATISTICS COLUMN(column_name) ON table_schema.table_name;

Pushdown#

The connector supports pushdown for a number of operations:

Aggregate pushdown for the following functions:

Additionally, pushdown is only supported for DOUBLE type columns with the following functions:

Additionally pushdown is only supported for REAL or DOUBLE type column with the following functions:

Cost-based join pushdown#

The connector supports cost-based Join pushdown to make intelligent decisions about whether to push down a join operation to the data source.

When cost-based join pushdown is enabled, the connector only pushes down join operations if the available Table statistics suggest that doing so improves performance. Note that if no table statistics are available, join operation pushdown does not occur to avoid a potential decrease in query performance.

The following table describes catalog configuration properties for join pushdown:

Property name

Description

Default value

join-pushdown.enabled

Enable join pushdown. Equivalent catalog session property is join_pushdown_enabled.

true

join-pushdown.strategy

Strategy used to evaluate whether join operations are pushed down. Set to AUTOMATIC to enable cost-based join pushdown, or EAGER to push down joins whenever possible. Note that EAGER can degrade query performance and is only recommended for testing and troubleshooting.

AUTOMATIC

Predicate pushdown support#

The connector’s support of predicate pushdown is partially dependent on the configured transaction mode. You can modify the connector’s transaction mode in the catalog configuration file.

If the transaction mode is set to TERA, the connector does not support pushdown of any predicates on columns of character string types like CHAR or VARCHAR.

If the transaction mode is set to ANSI, the connector does not support pushdown of inequality predicates such as !=, or range predicates such as > and BETWEEN, on columns with character string types. Equality predicates, such as IN or =, on columns with character string types are pushed down. This ensures correctness of results since the remote data source may sort strings differently than SEP.

In the following example, the predicate of the first query is not pushed down since name is a column of type VARCHAR, and > is an inequality predicate. The second query is pushed down.

SELECT * FROM nation WHERE name > 'CANADA';
SELECT * FROM nation WHERE name = 'CANADA';

Dynamic filtering#

Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.

You can disable dynamic filtering by setting the property dynamic-filtering.enabled in your catalog properties file to false.

JDBC connection pooling#

You can improve performance by enabling JDBC connection pooling, which is disabled by default.

Starburst Cached Views#

The connectors supports table scan redirection to improve performance and reduce load on the data source.

Lock manager#

The Lock Manager imposes concurrency control on Teradata Database by managing the locks on the database objects being accessed by each transaction and releasing those locks when the transaction either commits or rolls back its work. This control ensures that the data remains consistent for all users. However for analytic queries default Teradata Lock Manager could be too restrictive.

Locking configuration properties#

Property name

Description

LOCKING ROW FOR [ACCESS | READ] request modifier is used for outer SELECT requests. Defaults to ACCESS.

Security#

The connector includes a number of security-related features, detailed in the following sections.

User impersonation#

User impersonation can be enabled in the catalog file:

User impersonation in Teradata connector is based on GRANT CONNECT THROUGH. For more details, search for GRANT CONNECT THROUGH on the Teradata documentation.

Note

Option WITH TRUST_ONLY in GRANT CONNECT THROUGH is not yet supported.

Kerberos authentication#

The connector supports Kerberos-based authentication with the following configuration:

kerberos.client.principal=example@example.com
kerberos.client.keytab=etc/kerberos/example.keytab
kerberos.config=etc/kerberos/krb5.conf

With this configuration the user example@example.com, defined in the principal property, is used to connect to the database, and the related Kerberos service ticket is located in the example.keytab file. The Kerberos configuration specified with kerberos.config is used.

Kerberos credential pass-through#

The connector can be configured to pass through Kerberos credentials, received by SEP, to the Teradata database.

Configure Kerberos and SEP, following the instructions in Kerberos credential pass-through.

Then configure the connector to pass through the credentials from the server to the database in your catalog properties file and ensure the Kerberos client configuration properties are in place:

http.authentication.krb5.config=/etc/krb5.conf
http-server.authentication.krb5.service-name=exampleServiceName
http-server.authentication.krb5.keytab=/path/to/Keytab/File

Now any database access via SEP is subject to the data access restrictions and permissions of the user supplied via Kerberos.

The connector supports password credential pass-through. To enable it, edit the catalog properties file to include the authentication type:

Audit#

The connector utilizes the QUERY_ID property for Teradata’s Query Band to annotate any query from SEP in Teradata with the SEP query identifier. If teradata.query-band.query-id.enabled=true each Teradata query has QUERY_ID=<query_id> with the current identifier added. This allows you to determine which queries originated from SEP, and specifically also details such as user submitting the query, for audit purposes.

Audit configuration properties#

Property name

Description

Add a SEP query identifier to the Teradata QUERY_BAND. Defaults to false.

Name of a SEP property for query identifier in the Teradata QUERY_BAND. Defaults to QUERY_ID.

SQL DELETE#

If a WHERE clause is specified, the DELETE operation only works if the predicate in the clause can be fully pushed down to the data source.

ALTER TABLE#

The connector does not support renaming tables across multiple schemas. For example, the following statement is supported:

ALTER TABLE catalog.schema_one.table_one RENAME TO catalog.schema_one.table_two

The following statement attempts to rename a table across schemas, and therefore is not supported:

ALTER TABLE catalog.schema_one.table_one RENAME TO catalog.schema_two.table_two

Compared to the Starburst Teradata connector, it requires more advanced configuration and is capable of achieving higher performance. It supports all use cases of the Teradata connector, and adds greatly improved performance for SELECT queries.

The connector is composed of components running on SEP, and others running on Teradata. All components participate in every transmission. A transmission includes all data transfer related to a single table scan.

The connector uses JDBC as a control channel to schedule the query in Teradata. Any SELECT statement run with JDBC is passed to a Teradata table operator, which then uses the custom transmitters to pass the data over HTTP to the SEP controller and receivers:

Starburst Enterprise

Controller

The controller starts and manages all transmissions. It runs on the SEP coordinator and distributes work to the receivers, and provides status information to the transmitters.

The receivers connect to the transmitters and receive data from Teradata from the transmitters. They run on the SEP workers and are managed by the controller.

Table operator and JDBC driver

The table operator and JDBC driver read the data from the Teradata storage.

Transmitters

The transmitters send the data for each table operator invocation on Teradata to the receivers on SEP.

The following features from the Teradata connector are also supported by the Teradata Direct connector:

Besides the connector name, you need to provide the JDBC connection parameters and the relevant JAR files in the teradata plugin directory, as discussed in Starburst Teradata connector.

Additionally you need to install the native table operator on Teradata in the queried schema and configure the parameters for the connector specifying the table operator name, and the HTTP port and host of the SEP coordinator node.

Required Teradata Direct connector catalog properties#

Property name

Description

Native Table Operator function name.

Sets the receiver port the same on each node. If this is not specified, ports are set randomly.

We strongly recommend that you set this to DEBUG until you have validated that all networking and access works as expected. You can safely remove this setting once you have verified all connections. Leaving it set to DEBUG can affect performance.

General configuration properties#

The following table describes general catalog configuration properties for the connector:

Property name

Description

Default value

case-insensitive-name-matching

Support case insensitive schema and table names.

false

case-insensitive-name-matching.cache-ttl

1m

case-insensitive-name-matching.config-file

Path to a name mapping configuration file in JSON format that allows Trino to disambiguate between schemas and tables with similar names in different cases.

null

case-insensitive-name-matching.refresh-period

Frequency with which Trino checks the name matching configuration file for changes.

0 (refresh disabled)

Duration for which metadata, including table and column statistics, is cached.

0 (caching disabled)

Cache the fact that metadata, including table and column statistics, is not available

false

Maximum number of objects stored in the metadata cache

10000

write.batch-size

Maximum number of statements in a batched execution. Do not change this setting from the default. Non-default values may negatively impact performance.

1000

Further configuration can be done with the following parameters:

Determines the buffer size per each SEP worker. This buffer is available per table scan, so a single query joining three tables uses three buffers. Default value is 32MB.

teradata-direct.parcel.size in catalog.properties, or parcel_size in catalog session properties:

The size of the data packets sent to the SEP receivers. The last packet may be smaller. Defaults to 2MB. Parcel size should not be higher than the buffer size.

teradata-direct.compression.enabled in catalog.propertes, or compression_enabled in catalog session properties:

Determines if transmission data is compressed. Defaults to false.

teradata-direct.splits-per-worker in catalog.propertes, or splits_per_worker in catalog session properties:

Determines how many splits each SEP worker receives. This determines the maximum concurrency level on SEP. Defaults to 16, identical to the task.concurrency of the SEP server.

teradata-direct.table-operator.logging-path in the catalog properties file:

Set the path for log files of the native table operator on the Teradata nodes. Default behavior is to log to files in the the /tmp folder.

Configuring in Kubernetes deployments#

The Starburst Teradata Direct connector is supported for Kubernetes deployments in AWS EKS and in Azure AKS meeting certain requirements.

Warning

The configuration to use the connector on Kubernetes is complex and requires significant networking and Kubernetes expertise. Contact Starburst Support for assistance and help.

Encryption#

The connector can encrypt all data transmissions and use HTTPS. Add the details for the Java keystore file to use in the SEP server config.

SEP automatically distributes the keystore to the controller, the receivers, the transmitters and the table operator.

Native table operator#

Install the native table operator with the following steps:

• Copy the provided file onto all Teradata servers into an identical location such as /opt/starburst.

• Allow Teradata processes to access the file by either adding read permission with a command such as chmod +r <.so file>, or allowing access by the tdtrusted group with chown :tdtrusted <.so file>; chmod g+r <.so file>.

• Create the table operator function in the desired schema. We recommend a dedicated database for the current operator version.

CREATE FUNCTION some_database.table_operator()
RETURNS TABLE VARYING USING FUNCTION starburst_teradata_contract
LANGUAGE C
NO SQL
PARAMETER STYLE SQLTABLE

Warning

Do not install the native table operator in the syslib database.

No Teradata restart is needed for the changes to take effect.

By default the UDF is created to run in protected mode. It allocates memory and opens socket and can therefore not be used in unprotected mode.

The path and access for the .so file, for example /opt/starburst, has to be adjusted to a suitable location on your Teradata servers.

Each user that connects to Teradata from SEP needs to be granted sufficient access:

GRANT EXECUTE ON SPECIFIC FUNCTION some_database.table_operator TO your_user;

CREATE ROLE table_operator_users;
GRANT EXECUTE ON SPECIFIC FUNCTION some_database.table_operator
TO table_operator_users;
GRANT table_operator_users TO your_user;

This also requires to set the role for any impersonated user, this could be achieved with the below in the catalog file:

You must only update the native table operator when required in the release notes. The requirement can be in the specific release you are installing, or in a release that you are skipping over. For example, the requirement result from an STS release between two LTS releases. An update from one LTS to the other, skipping the STS, still includes the need to update the native table operator. Contact Starburst Support if you require specific guidance for your update.

The following list details the necessary steps:

1. Download the new native table operator starburst-direct-<version>.so. The version needs to correspond to the version of the SEP cluster.

DROP FUNCTION some_database.table_operator;

3. Drop the database used for the function in Teradata with DROP DATABASE.

4. Remove all old .so files from all Teradata server nodes.

5. Restart Teradata with tparesert -force to remove the native table operator from memory. This step may not be necessary and is solely determined by the memory management of Teradata. If you skip this step and run into an error like [Error 7559] [SQLState HY000] Memory (via malloc call) not freed before exiting UDF/XSP/UDM, you must restart the upgrade process.

6. Copy the new native table operator file to all Teradata server nodes. Use the original filename from the download. The filename is required to be different from each prior install.

7. Proceed with the same steps as performed during the initial installation. Make sure you update the create function statement to the new database and .so file path.

8. Update the catalog properties files to point to the new database on the SEP coordinator and all workers.