Cloudera Data Platform support#

Use the Starburst Hive connector to query Cloudera Data Platform (CDP) version 7.1 or higher.

Note

The Cloudera Data Platform support requires a valid Starburst Enterprise license.

Requirements#

The Starburst Hive connector can query the Cloudera Data Platform (CDP), available as version 7.x. It also supports the predecessor Cloudera Distributed Hadoop (CDH) platform, available in versions 5.x and 6.x. Support and compatibility vary based on the version you use, and is detailed in the following table:

CDP/CDH and SEP compatibility matrix#

Cloudera version

356-e

350-e

345-e

338-e

CDP 7

Yes

Yes

Yes

Yes

CDH 6.x

Yes

Yes

Yes

Yes

CDH 5.13+

Yes

Yes

Yes

Yes

CDH 5.12 and lower

No

No

No

No

The following details apply for CDH 6.x users:

  • reading tables and data files created by CDH 6.x is supported

  • transactional table usage is not supported

  • CDH 6.x Hive cannot read ORC files created by SEP, due to the behavior of the included Hive version

  • using the included Apache Sentry is not supported

The following details apply for CDH 5.x users:

  • reading tables and data files created by CDH 5.x is supported

  • transactional table usage is not supported

Configuration#

  • Edit your catalog properties file using the Hive connector

  • Set the metastore to use thrift-cdp7 when using CDP 7, and thrift for older versions.

  • Configure the URI to point to your Hive metastore Thrift service

connector.name=hive
hive.metastore=thrift-cdp7
hive.metastore.uri=thrift://cdp-master:9083

SQL support#

Reading data#

CDP support includes read operations on the following tables:

  • compacted tables

  • bucketed tables

  • partitioned tables

  • unpartitioned tables

The following file formats can be read:

  • Avro

  • CSV

  • ORC ACID

  • Parquet

  • RCFile

Writing data#

Write operations, such as CREATE TABLE AS or CREATE VIEW and others, are generally supported.

Write operations, such as INSERT, DELETE and UPDATE, on ORC ACID tables are not supported.

Performance#

Hive metastore and statistics#

The CDP support includes the improved thrift-cdp7 Hive metastore support. It supports the metastore thrift communication protocol regarding table statistics management implemented by CDP.

This supports separate handling of a variety of statistics for SEP:

  • Column statistics

  • Partition statistics

  • Table statistics

All statistics handling, when using CDP, is performed by the Hive connector and the thrift-cdp7 Hive metastore, and is therefore identical to standard Hive connector usage.