Starburst Salesforce connector#

The Starburst Salesforce connector allows Starburst Enterprise platform (SEP) to query Salesforce data available in your existing Salesforce account.

Requirements#

To connect to Salesforce, you need:

  • Network access from the coordinator and workers to the Salesforce API endpoints.

  • Security token for your Salesforce account, or configured range of IP addresses for logins.

  • A valid Starburst Enterprise license.

Configuration#

Create the example catalog with a catalog properties file in etc/catalog named example.properties (replace example with your database name or some other descriptive name of the catalog) with the following contents:

connector.name=salesforce
salesforce.user=mysalesforceaccount@example.com
salesforce.password=secret
salesforce.security-token=<token-from-Salesforce>
salesforce.enable-sandbox=true

Specify the connector.name property as salesforce. Configure the connector using your Salesforce account username, password, and the security token for your account. See Security for more information on authenticating with Salesforce.

You can optionally enable the Salesforce Sandbox as shown in the last line of the preceding example. The default value is false.

The connector can only access Salesforce as the configured user in the catalog. If you need to access Salesforce with a different user name, configure a separate catalog.

General configuration properties#

The following table describes general catalog configuration properties for the connector:

Property name

Description

case-insensitive-name-matching

Support case insensitive schema and table names. Defaults to false.

case-insensitive-name-matching.cache-ttl

Duration for which case insensitive schema and table names are cached. Defaults to 1m.

case-insensitive-name-matching.config-file

Path to a name mapping configuration file in JSON format that allows Trino to disambiguate between schemas and tables with similar names in different cases. Defaults to null.

case-insensitive-name-matching.config-file.refresh-period

Frequency with which Trino checks the name matching configuration file for changes. The duration value defaults to 0s (refresh disabled).

metadata.cache-ttl

Duration for which metadata, including table and column statistics, is cached. Defaults to 0s (caching disabled).

metadata.cache-missing

Cache the fact that metadata, including table and column statistics, is not available. Defaults to false.

metadata.schemas.cache-ttl

Duration for which schema metadata is cached. Defaults to the value of metadata.cache-ttl.

metadata.tables.cache-ttl

Duration for which table metadata is cached. Defaults to the value of metadata.cache-ttl.

metadata.statistics.cache-ttl

Duration for which tables statistics are cached. Defaults to the value of metadata.cache-ttl.

metadata.cache-maximum-size

Maximum number of objects stored in the metadata cache. Defaults to 10000.

write.batch-size

Maximum number of statements in a batched execution. Do not change this setting from the default. Non-default values may negatively impact performance. Defaults to 1000.

dynamic-filtering.enabled

Push down dynamic filters into JDBC queries. Defaults to true.

dynamic-filtering.wait-timeout

Maximum duration for which Trino waits for dynamic filters to be collected from the build side of joins before starting a JDBC query. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries. Defaults to 20s.

Warning

There is a known issue causing Salesforce catalogs to occasionally fail to list tables or views within their schemas. As a workaround, you can disable the metadata cache built-in to the JDBC driver used by the connector:

salesforce.extra-jdbc-properties=Other="cachemetadatatable=false;cachemetadatatablecolumns=false;cachemetadataschema=false;cachametadataschemas=false;";

You can enable the separate metadata cache of the connector:

metadata.cache-ttl=10m

Type mapping#

Because Trino and Salesforce each support types that the other does not, this connector modifies some types when reading data.

Salesforce to Trino type mapping#

The connector maps Salesforce types to the corresponding Trino types according to the following table:

Salesforce to Trino type mapping#

Salesforce type

Trino type

Auto Number

VARCHAR

Lookup Relationship

VARCHAR

Master-Detail Relationship

VARCHAR

External Lookup Relationship

VARCHAR

Checkbox

VARCHAR

Currency

DECIMAL

Date

DATE

Date/Time

TIMESTAMP

Email

VARCHAR

Geolocation

VARCHAR

Number

DOUBLE

Percent

DOUBLE

Phone per catalog

VARCHAR

Picklist

VARCHAR

Picklist (Multi-Select)

VARCHAR

Text

VARCHAR

Text Area

VARCHAR

Time

TIME

URL

VARCHAR

No other typs are supported.

Decimal type handling#

Salesforce Currency data types are mapped to SEP DECIMAL types with the same precision and scale as the Salesforce type. Values that require rounding or truncation to fit cause a failure at runtime, but the connector can be configured to allow rounding and truncation by setting the decimal-mapping catalog configuration property to allow_overflow and the decimal-rounding-mode catalog configuration property to one of the following:

  • UNNECESSARY (default)

  • UP

  • DOWN

  • CEILING

  • FLOOR

  • HALF_UP

  • HALF_DOWN

  • HALF_EVEN

You can change these settings for just the current session with the decimal_mapping and decimal_rounding_mode catalog session properties.

SQL support#

The connector supports globally available and read operation statements to access data and metadata in Salesforce.

Performance#

System information#

The Salesforce connector uses the Salesforce API to access data and metadata. Salesforce limits the number of API calls for organizations. If the limits are reached, query processing fails.

The connector provides API limit and current usage numbers in the limits table in the system schema. You can use this information for monitoring your usage to ensure a limit is not exceeded. The following query returns the data from your example catalog:

SELECT * FROM example.system.limits;

Dynamic filtering#

Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.

You can disable dynamic filtering by setting the dynamic-filtering.enabled property in your catalog configuration file to false.

Wait timeout#

By default, table scans on the connector are delayed up to 20 seconds until dynamic filters are collected from the build side of joins. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries.

You can configure the dynamic-filtering.wait-timeout property in your catalog properties file:

dynamic-filtering.wait-timeout=1m

You can use the dynamic_filtering_wait_timeout catalog session property in a specific session:

SET SESSION example.dynamic_filtering_wait_timeout = 1s;

Compaction#

The maximum size of dynamic filter predicate, that is pushed down to the connector during table scan for a column, is configured using the domain-compaction-threshold property in the catalog properties file:

domain-compaction-threshold=100

You can use the domain_compaction_threshold catalog session property:

SET SESSION domain_compaction_threshold = 10;

By default, domain-compaction-threshold is set to 32. When the dynamic predicate for a column exceeds this threshold, it is compacted into a single range predicate.

For example, if the dynamic filter collected for a date column dt on the fact table selects more than 32 days, the filtering condition is simplified from dt IN ('2020-01-10', '2020-01-12',..., '2020-05-30') to dt BETWEEN '2020-01-10' AND '2020-05-30'. Using a large threshold can result in increased table scan overhead due to a large IN list getting pushed down to the data source.

Metrics#

Metrics about dynamic filtering are reported in a JMX table for each catalog:

jmx.current."io.trino.plugin.jdbc:name=example,type=dynamicfilteringstats"

Metrics include information about the total number of dynamic filters, the number of completed dynamic filters, the number of available dynamic filters and the time spent waiting for dynamic filters.

Table scan redirection#

The connector supports table scan redirection to improve performance and reduce load on the data source.

This is particularly recommended for the Salesforce connector as Salesforce has API limits, which can cause queries to fail.

Security#

The connector includes a number of security-related features, detailed in the following sections.

Password authentication#

The connector supports authenticating via a user name. Use the following properties in the catalog properties file to configure it.

salesforce.authentication.type=PASSWORD
salesforce.user=mysalesforceaccount@example.com
salesforce.password=secret
salesforce.security-token=<token-from-Salesforce>
salesforce.enable-sandbox=true

The default value of salesforce.authentication.type is PASSWORD.

The security token is generated by Salesforce and emailed to you. To generate an access token, at the top navigation bar in Salesforce, go to <your name> > My Settings > Personal > Reset My Security Token. Note that whenever the user’s password changes, including regularly scheduled password expiration, the security token also changes and requires the connector to be reconfigured using the new password and token.

You can use a configured range of IP addresses approved for login as an alternative to the security token. In this case, do not specify salesforce.security-token in the catalog file. You must correctly configure the IP ranges for the coordinator and all worker nodes in Salesforce’s Security Controls > Network Access page for your organization.

OAuth JWT authentication#

The connector supports authenticating using OAuth 2.0. This method requires setup in Salesforce, but unlike password authentication it does not expire until access is explicitly revoked by Salesforce. Use the following properties in the catalog properties file to configure this method.

salesforce.authentication.type=OAUTH_JWT
salesforce.oauth.pkcs12-path=/path/to/salesforce-ca.p12
salesforce.oauth.pkcs12-password=pkcs12-password
salesforce.oauth.jwt-issuer=<salesforce connected app consumer key>
salesforce.oauth.jwt-subject=<salesforce user name or email address>

The trust is set up on the Salesforce side using a certificate. At a high level, you create a PKCS12 archive and certificate, then create an OAuth-enabled Connected App in Salesforce that uses that certificate. You then give permission to the user to use that Connected App and configure the Salesforce connector to use the PKCS12 archive and user name. This sets up a trust that never expires until it is explicitly revoked by a user in the Salesforce UI.

An example of setting up this configuration is as follows:

  1. Create a secure password and export it as the PKCS12_PASS environment variable. Keep this password, as it is necessary to configure the PKCS12 arvhive in the catalog configuration.

export PKCS12_PASS=averysecurepassword
  1. Create a PKCS12 archive named salesforce-ca.p12.

keytool -genkeypair -v \
  -alias salesforce-ca \
  -dname "CN=Starburst Salesforce Connector" \
  -storetype PKCS12 \
  -keystore salesforce-ca.p12 \
  -keypass:env PKCS12_PASS \
  -storepass:env PKCS12_PASS \
  -keyalg RSA \
  -keysize 4096 \
  -ext KeyUsage:critical="keyCertSign" \
  -ext BasicConstraints:critical="ca:true" \
  -validity 9999
  1. Export the certificate to salesforce-ca.crt.

keytool -export -v \
  -alias salesforce-ca \
  -file salesforce-ca.crt \
  -keypass:env PKCS12_PASS \
  -storepass:env PKCS12_PASS \
  -keystore salesforce-ca.p12 \
  -rfc
  1. Log in to Salesforce using a username and password that has authorization to create a Connected App. Select the gear icon in the upper right and select Setup, then Apps -> App Manager, then New Connected App in the upper right. Enter a name for the Connected App, such as SEP Salesforce Connector and provide a contact email address. A Callback URL is required but unused by this method; enter a valid URL, such as http://localhost.

  2. Select Enable OAuth Settings and Use digital signatures, uploading salesforce-ca.crt to the site. For Selected OAuth Scopes use Access content resources (content) at a minimum. Copy and keep the resulting Consumer Key, which is needed to configure the connector.

  3. Create a new Permission Set. Navigate to Users -> Permissions Sets and select New. Enter a label, such as SEP Salesforce Connector Permission Set and select Save. Select Assigned Connected Apps then Edit. Add the connected app name created in step 4.

  4. Add the Permission Set to the User. Select Users and locate the username you want to use to authenticate with Salesforce. When found, click to select that username. Select Permission Set Assignments then Edit Assignments. Add the permission set to the username and save.

This concludes the Salesforce part of the setup procedure.

  1. Configure the Salesforce catalog to enable OAuth 2.0 JWT. You need the PKCS12 archive’s password, the Consumer Key from Salesforce, and the user’s email address login.

salesforce.authentication.type=OAUTH_JWT
salesforce.oauth.pkcs12-path=/path/to/salesforce-ca.p12
salesforce.oauth.pkcs12-password=pkcs12-password
salesforce.oauth.jwt-issuer=<Salesforce connected app consumer key>
salesforce.oauth.jwt-subject=<Salesforce user name or email address>