Starburst MaxCompute connector#
The MaxCompute connector allows users to query data in MaxCompute databases.
Requirements#
To use the MaxCompute connector, you need:
Network access from the coordinator and workers to the MaxCompute server.
A valid Starburst Enterprise license.
Configuration#
Create a catalog properties file in etc/catalog
named example.properties
to
access the configured MaxCompute database in the example
catalog (replace
example
with your database name or some other descriptive name of the
catalog). Configure the usage of the connector by specifying the name
maxcompute
and replace the connection properties as appropriate for your
setup.
connector.name=maxcompute
maxcompute.project.name=max_compute
maxcompute.access.id=access id
maxcompute.access.key=access key
maxcompute.endpoint=http://service.cn-example.maxcompute.aliyun.com/api
General configuration properties#
The following table describes catalog configuration properties for the connector:
Property name |
Description |
---|---|
|
Name of the MaxCompute project. Required. |
|
Unique identifier used to access MaxCompute resources securely. Required. |
|
Access key used to authenticate access to MaxCompute. Required. |
|
Endpoint used to communicate with MaxCompute. Required. |
|
Endpoint where the tunneling protocol should connect, used to improve performance. |
|
Comma separated list of additional MaxCompute projects to be exposed as SEP schemas. |
|
Maximum size for each split of the input data. |
Optionally, configure maxcompute.tunnel.endpoint
to improve performance:
maxcompute.tunnel.endpoint=http://dt.cn-example.maxcompute.aliyun.com
Type mapping#
Because Trino and MaxCompute each support types that the other does not, this connector modifies some types when reading data. Data types may not map the same way between SEP and the data source. Refer to the following section for type mapping.
MaxCompute to Trino type mapping#
The connector maps MaxCompute types to the corresponding Trino types following this table:
MaxCompute type |
Trino type |
Notes |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Special characters in |
|
|
|
|
|
|
|
|
|
|
|
No other types are supported.
SQL support#
The connector provides globally available and read operation statements to access data and metadata in MaxCompute:
View management, see also View management
View management#
The connector supports read-only views to data and metadata exposed by the connector accessing a data source.
Note
When you query a view, the underlying SELECT
statement that defines that view
is executed in MaxCompute, then the result set is returned. Because some
computation is completed by MaxCompute, this may incur costs in MaxCompute.
Materialized views#
The connector supports Materialized view management. In the underlying system, each materialized view consists of a query statement and a MaxCompute virtual table. When you query a view, the query statement converts into the SQL statement that is used to define the view.
External tables#
The connector lets you view external tables and access unstructured data stored externally.
Managed tables#
Managed tables are fully managed by MaxCompute, including the physical storage
of the tables. For these tables, SEP uses the tunnel
API to
retrieve data in MaxCompute. This lets SEP determine the data size, split it
by records and partitions, and use parallelization with multiple InputSplits
.
Performance#
The connector includes a number of performance improvements, detailed in the following sections.
Pushdown#
The connector supports partition and projection pushdown.
Projection pushdown#
The connector supports Projection pushdown for VIEWS
, MATERIALIZED VIEWS
, and external tables.