Starburst DynamoDB connector#

The Starburst DynamoDB connector allows querying data stored in Amazon DynamoDB.

Requirements#

To connect to DynamoDB, you need:

  • Amazon Web Services (AWS) access credentials with an attached policy to be able to read from DynamoDB.

  • Network access from the coordinator and workers to the AWS region hosting your DynamoDB tables.

  • A valid Starburst Enterprise license.

Configuration#

Create the mydynamodb catalog with a catalog properties file in etc/catalog named mydynamodb.properties:

connector.name=dynamodb
dynamodb.aws-access-key=AKIAIOSFODNN7EXAMPLE
dynamodb.aws-secret-key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
dynamodb.aws-region=us-east-1
dynamodb.generate-schema-files=NEVER
dynamodb.schema-directory=/path/to/schema/directory

Specify the connector.name property as dynamodb. Configure the catalog using your AWS access key, secret key, and region for your account.

The connector can only access DynamoDB with the configured access credentials in the catalog. If you need to access DynamoDB with different credentials or a different region, configure a separate catalog. See more information about Type mapping and schemas session and catalog session properties.

General configuration properties#

The following table describes general catalog configuration properties for the connector:

Property name

Description

Default value

case-insensitive-name-matching

Support case insensitive schema and table names.

false

case-insensitive-name-matching.cache-ttl

1m

case-insensitive-name-matching.config-file

Path to a name mapping configuration file in JSON format that allows Trino to disambiguate between schemas and tables with similar names in different cases.

null

case-insensitive-name-matching.refresh-period

Frequency with which Trino checks the name matching configuration file for changes.

0 (refresh disabled)

metadata.cache-ttl

Duration for which metadata, including table and column statistics, is cached.

0 (caching disabled)

metadata.cache-missing

Cache the fact that metadata, including table and column statistics, is not available

false

metadata.cache-maximum-size

Maximum number of objects stored in the metadata cache

10000

write.batch-size

Maximum number of statements in a batched execution. Do not change this setting from the default. Non-default values may negatively impact performance.

1000

join-pushdown.enabled

Enable join pushdown. Equivalent catalog session property is join_pushdown_enabled. Enabling this may negatively impact performance for some queries.

false

Generate a schema file#

The connector exposes the data in your DynamoDB instance in the amazondynamodb schema of the catalog.

A schema file for a DynamoDB details the SEP table name, columns, and data types. It additionally specifies which DynamoDB attribute (or nested attribute) the connector uses to populate data for the column.

If you do not specify a schema file, the connector generates a schema file. The connector samples up to 50 rows from the DynamoDB table and uses those to generate a schema file. The schema files are in RealSQLDatabase (RSD) format and accessible using SQL commands.

The value dictates the behavior:

  • NEVER (default): A schema file is never generated.

  • ON_USE: A schema file is generated the first time a table is referenced, provided the schema file for the table does not already exist.

  • ON_START: A schema file is generated at connection time for any tables that do not currently have a schema file.

Several attributes detail the mapping from the SQL type to DynamoDB:

Schema file column definitions#

Attribute name

Description

Default Value

name

The name of the column in the SEP table

xs:type

The SQL type of the SEP table. Supported types are below

columnsize

Length of a SQL column that supports variable widths like VARCHAR

255

key

true if this column is part of the DynamoDB primary key

false

other:keytype

The type of the DynamoDB key. Valid values are either HASH or RANGE

other:path

The name of the attribute in DynamoDB

name

other::internaltype

The DynamoDB Type. Valid values are N, S, SS, or M

Example#

This is an example of a schema file for the TPC-H orders table. The name of the file is orders.rsd and goes in the directory specified by the dynamodb.schema-directory configuration property on every SEP host.

The api:info element definition includes the name of the table , orders, set to the title and other:internaltablename attributes, along with any additional table and schema attributes.

The api:info element contains all columns in the SEP table definition as child elements.

The additional api:script definitions for the GET, POST, MERGE, and DELETE API methods are required in every schema file. These definitions are standard, you can copy them as they are written below.

<api:script xmlns:api="http://apiscript.com/ns?v1">
  <api:info title="orders" other:internaltablename="orders" other:catalog="CData" other:schema="AmazonDynamoDB" description="null" other:version="20">
    <attr name="orderkey"      xs:type="bigint" key="true"      other:keytype="HASH" other:path="orderkey" other:internaltype="N" />
    <attr name="custkey"       xs:type="bigint"                 other:path="custkey"       other:internaltype="N" />
    <attr name="orderstatus"   xs:type="string" columnsize="1"  other:path="orderstatus"   other:internaltype="S" />
    <attr name="totalprice"    xs:type="double"                 other:path="totalprice"    other:internaltype="N" />
    <attr name="orderdate"     xs:type="date"                   other:path="orderdate"     other:internaltype="S" />
    <attr name="orderpriority" xs:type="string" columnsize="15" other:path="orderpriority" other:internaltype="S" />
    <attr name="clerk"         xs:type="string" columnsize="15" other:path="clerk"         other:internaltype="S" />
    <attr name="shippriority"  xs:type="integer"                other:path="shippriority"  other:internaltype="N" />
    <attr name="comment"       xs:type="string" columnsize="79" other:path="comment"       other:internaltype="S" />
  </api:info>

  <api:script method="GET">
    <api:call op="dynamodbadoMultiThreadSelect">
      <api:push/>
    </api:call>
  </api:script>

  <api:script method="POST">
    <api:call op="dynamodbadoAdd">
      <api:push/>
    </api:call>
  </api:script>

  <api:script method="MERGE">
    <api:call op="dynamodbadoUpdate">
      <api:push/>
    </api:call>
  </api:script>

  <api:script method="DELETE">
    <api:call op="dynamodbadoDelete">
      <api:push/>
    </api:call>
  </api:script>
</api:script>

Type mapping#

The supported SQL types include:

  • BOOLEAN

  • TINYINT

  • SMALLINT

  • BIGINT

  • REAL

  • DOUBLE

  • VARCHAR(N)

  • VARBINARY

  • DATE

All other SQL types are not supported.

Type mapping and schemas#

Review the table to learn more about mapping the relational values of nested attributes and arrays. This is used when generating a schema file and dictates how the table displays.

Catalog and session catalog properties#

Property name

Description

Default value

dynamodb.schema-directory

The relative or absolute path to a directory that contains schema files. The directory must exist on every host. More information on generating schema files.

${java.io.tmpdir}/dynamodb-schemas

dynamodb.flatten-objects-enabled

Set to true to enable the expansion of objects into flat nested attributes in individual columns or to return as a single JSON string. Enable this feature on a per-query basis with a dynamodb.flatten_objects_enabled session property.

false

dynamodb.flatten-array-element-count

The default returns the array as a single column of JSON. Set the integer value to 256 to convert array attributes in DynamoDB into individual columns. Enable this feature on a per-query basis with a dynamodb.flatten_arrays_element_count session property.

0

Maps and string sets#

DynamoDB supports nested attributes of maps and string sets, and the connector supports flattening these elements to relational columns. By default, any map or string set is returned as a single JSON column of type VARCHAR.

Take the below table foo which has a single key row_id and one column bar which is a DynamoDB type M (only the api:info section is shown below in the example section).

<api:info title="foo">
  <attr name="row_id" xs:type="string" columnsize="255"  other:path="row_id" other:internaltype="S" key="true" other:keytype="HASH" />
  <attr name="bar"    xs:type="string" columnsize="2000" other:path="bar"    other:internaltype="M" />
</api:info>

The following exmaple shows a value of bar is below. It contains three keys a, a string, b, a number, and c, a string set.

{
    "a": "string value",
    "b": 123,
    "c": ["foo", "bar"]
}

To flatten this value into a map, change the M column in our schema file to four columns named bar.a, bar.b, bar.c.0, and bar.c.1, setting the SQL types and DynamoDB types via the xs:type and other:internaltype attributes, respectively. Then set other:path to bar___cdata___a, where ___cdata___ is a path separator to project into the map to select the a attribute in the top-level bar attribute.

The string set uses the same separator as the map. Instead of specifying an attribute, use 0 and 1 to map the first and second elements of the string set to c.0 and c.1 SQL columns.

<api:info title="foo">
  <attr name="row_id" xs:type="string" columnsize="255" other:path="row_id" other:internaltype="S" key="true" other:keytype="HASH" />
  <attr name="bar.a"      xs:type="string" columnsize="255" other:path="bar___cdata___a"             other:internaltype="S" />
  <attr name="bar.b"      xs:type="bigint"                  other:path="bar___cdata___b"             other:internaltype="N" />
  <attr name="bar.c.0"    xs:type="string" columnsize="255" other:path="bar___cdata___c___cdata___0" other:internaltype="S" />
  <attr name="bar.c.1"    xs:type="string" columnsize="255" other:path="bar___cdata___c___cdata___1" other:internaltype="S" />
</api:info>

For string set types, it uses the same method of projection. Below, baz is of type SS.

<api:info title="foo">
  <attr name="row_id" xs:type="string" columnsize="255"  other:path="row_id" other:internaltype="S" key="true" other:keytype="HASH" />
  <attr name="baz"    xs:type="string" columnsize="2000" other:path="baz"    other:internaltype="SS"  />
</api:info>

It splits it into three columns:

<api:info title="foo">
  <attr name="row_id" xs:type="string" columnsize="255"  other:path="row_id" other:internaltype="S" key="true" other:keytype="HASH" />
  <attr name="baz.0"  xs:type="string" columnsize="2000" other:path="baz___cdata___0"    other:internaltype="S"  />
  <attr name="baz.1"  xs:type="string" columnsize="2000" other:path="baz___cdata___1"    other:internaltype="S"  />
  <attr name="baz.2"  xs:type="string" columnsize="2000" other:path="baz___cdata___2"    other:internaltype="S"  />
</api:info>

If baz contains more than three elements, it is omitted from the result set. If baz contains fewer than three elements, the value of the column is NULL where an element is not present.

Using this method, you can project into many nested attributes in DynamoDB for both maps and string sets.

SQL support#

The connector supports globally available and read operation statements to access data and metadata in DynamoDB.

Performance#

The connector includes a number of performance improvements, detailed in the following sections.

Dynamic filtering#

Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a query.

You can disable dynamic filtering by setting the property dynamic-filtering.enabled in your catalog properties file to false.

Starburst Cached Views#

The connector supports table scan redirection to improve performance and reduce load on the data source.