Starburst Hive connector#
The Starburst Hive connector is an extended version of the Hive connector with configuration and usage identical.
Fulfill the Hive connector requirements.
Additional features of the connector require a valid Starburst Enterprise license, unless otherwise noted.
The Starburst Hive connector supports the following extensions.
Amazon Glue support#
Statistics collection is supported for Hive Metastore and Amazon Glue.
Configuring and using SEP with AWS Glue is described in the AWS Glue documentation section.
IBM Cloud Object Storage support#
The connector querying IBM Cloud Object Storage.
MapR Hive support#
The connector includes support for a MapR-based Hive metastore as well as the MapR filesystem.
OpenX JSON format support#
The connector supports reading and writing data to tables as JSON files, and use
the OpenX JSON serialization and deserialization (serde) from the Java class
Existing tables using that serde and all the associated serde properties are handled automatically.
The actual serde implementation is a fork of the original OpenX serde. It is updated to be compatible with the Hive 3 APIs used in SEP. The binary package of the forked serde implementation is available from Starburst Support. You can install the package in your systems reading and writing to your Hive-managed storage with Hive 3 for optimal compatibility.
The connector configuration is similar to the configuration for the base Hive connector, with these additional properties:
Name of the catalog to which
The connector includes a number of performance improvements, detailed in the following sections.
The connector supports the default storage caching. In addition, if HDFS Kerberos authentication is enabled in your catalog properties file with the following setting, caching takes the relevant permissions into account and operates accordingly:
Additional configuration for Kerberos is required.
If HDFS Kerberos authentication is enabled, you can also enable user impersonation using:
The service user assigned to SEP needs to be able to access data files in underlying storage. Access permissions are checked against impersonated user, yet with caching in place, some read operations happen in context of system user.
Any access control defined with the integration of Apache Ranger or the Privacera platform is also enforced by the storage caching.
Table scan redirection#
The connector supports table scan redirection to improve performance and reduce load on the data source.
The connector includes a number of security-related features, detailed in the following sections.
SEP includes provides several authorization options for use with the Hive connector:
Apache Ranger is the recommended choice to provide global, system-level security, which can optionally be used with other connectors
Apache Sentry is supported with known limitations
Before running any
CREATE TABLE or
CREATE TABLE ... AS statements for
Hive tables in SEP, you need to check that the operating system user running
the SEP server has access to the Hive warehouse directory on HDFS.
The Hive warehouse directory is specified by the configuration variable
hive-site.xml, and the default value is
/user/hive/warehouse. If that is not the case, either add the following to
jvm.config on all of the nodes:
is an operating system user that has proper permissions for the Hive warehouse
directory, or start the SEP server as a user with similar permissions. The
hive user generally works as
USER, since Hive is often started with the
hive user. If you run into HDFS permissions problems on
CREATE TABLE ...
/tmp/presto-* on HDFS, fix the user as described above, then
restart all of the SEP servers.
The following limitation apply in addition to the limitations of the Hive connector.
Reading ORC ACID tables created with Hive Streaming ingest is not supported.
Redirections are supported for Hive tables but not Hive views.
Hive 3 related limitations#
For security reasons,
syssystem catalog is not accessible in SEP.
timestamp with local zonedata type is not supported in SEP. It is possible to read from a table having a column of this type, but the column itself will not be accessible. Writing to such a table is not supported.
SEP does not correctly read
timestampvalues from Parquet, RCFile with binary serde and Avro file formats created by Hive 3.1 or later due to Hive issues HIVE-21002, HIVE-22167. When reading from these file formats, SEP returns different results.