Starburst Cosmos DB connector#

The Starburst Cosmos DB connector uses the API for NoSQL to read data stored in Azure Cosmos DB for NoSQL.

The Starburst Cosmos DB connector only supports connecting to Azure Cosmos DB for NoSQL. If you are using Azure Cosmos DB for PostgreSQL, MongoDB, or Apache Cassandra, use the native PostgreSQL, MongoDB, or Cassandra connectors instead.

Note

The Starburst Cosmos DB connector is a public preview. Contact Starburst Support with questions or feedback.

Requirements#

To connect to Azure Cosmos DB for NoSQL, you need:

  • Azure access credentials with an attached policy to be able to read from CosmosDB.

  • Network access from the coordinator and workers to the Cosmos DB instance. By default this connection uses HTTPS over port 443.

  • A valid Starburst Enterprise license.

  • Data in Cosmos DB must be stored in Azure Cosmos DB for NoSQL.

Configuration#

Create the example catalog with a catalog properties file in etc/catalog named example.properties (replace example with your database name or some other descriptive name of the catalog) with the following contents:

connector.name=cosmosdb
cosmosdb.connection-url=https://ACCOUNT_NAME.documents.azure.com:443/
cosmosdb.connection-key=sample-key

Specify the connector.name property as cosmosdb. Configure the catalog using your Azure Cosmos DB connection URL and access key. The connection URL may be formatted differently from the example provided here.

Case insensitive matching#

When case-insensitive-name-matching is set to true, Trino is able to query non-lowercase schemas and tables by maintaining a mapping of the lowercase name to the actual name in the remote system. However, if two schemas and/or tables have names that differ only in case (such as “customers” and “Customers”) then Trino fails to query them due to ambiguity.

In these cases, use the case-insensitive-name-matching.config-file catalog configuration property to specify a configuration file that maps these remote schemas/tables to their respective Trino schemas/tables:

{
  "schemas": [
    {
      "remoteSchema": "CaseSensitiveName",
      "mapping": "case_insensitive_1"
    },
    {
      "remoteSchema": "cASEsENSITIVEnAME",
      "mapping": "case_insensitive_2"
    }],
  "tables": [
    {
      "remoteSchema": "CaseSensitiveName",
      "remoteTable": "tablex",
      "mapping": "table_1"
    },
    {
      "remoteSchema": "CaseSensitiveName",
      "remoteTable": "TABLEX",
      "mapping": "table_2"
    }]
}

Queries against one of the tables or schemes defined in the mapping attributes are run against the corresponding remote entity. For example, a query against tables in the case_insensitive_1 schema is forwarded to the CaseSensitiveName schema and a query against case_insensitive_2 is forwarded to the cASEsENSITIVEnAME schema.

At the table mapping level, a query on case_insensitive_1.table_1 as configured above is forwarded to CaseSensitiveName.tablex, and a query on case_insensitive_1.table_2 is forwarded to CaseSensitiveName.TABLEX.

By default, when a change is made to the mapping configuration file, Trino must be restarted to load the changes. Optionally, you can set the case-insensitive-name-matching.config-file.refresh-period to have Trino refresh the properties without requiring a restart:

case-insensitive-name-matching.config-file.refresh-period=30s

SQL support#

The connector provides globally available and read operation statements to access data and metadata in Cosmos DB databases.

Type mapping#

Because Trino and Cosmos DB each support types that the other does not, this connector modifies some types when reading data. Data types may not map the same way in both directions between SEP and the data source. Refer to the following sections for type mapping in each direction.

Cosmos DB to Trino type mapping#

The connector maps Cosmos DB types to the corresponding Trino types following this table:

Cosmos DB to Trino type mapping#

Cosmos DB type

Trino type

Notes

Boolean

BOOLEAN

Double

DOUBLE

Cosmos DB uses IEEE 754 double precision for its number type. All numeric types in Cosmos DB are mapped to DOUBLE.

String

VARCHAR

Object

ROW

Array

ARRAY

Mapped instead to ROW if all elements in the array are of the same type.

No other types are supported.

Performance#

The connector includes a number of performance improvements, detailed in the following sections.

Pushdown#

The connector supports pushdown for Limit pushdown and some predicates. Predicate pushdown is only supported for equality (=) and range (<, >) expressions, on columns of type VARCHAR and BOOLEAN.