Starburst Hive connector#
The Starburst Hive connector is an extended version of the Hive Connector with configuration and usage identical.
Note
The additional features of the connector require a valid Starburst Enterprise license, unless otherwise noted.
The following improvements are included:
Authorization options#
SEP includes provides several authorization options for use with the Hive connector:
Apache Ranger is the recommended choice to provide global, system-level security, which can optionally be used with other connectors
Apache Sentry is supported with known limitations
HDFS permissions#
Before running any CREATE TABLE
or CREATE TABLE ... AS
statements for
Hive tables in SEP, you need to check that the operating system user running
the SEP server has access to the Hive warehouse directory on HDFS.
The Hive warehouse directory is specified by the configuration variable
hive.metastore.warehouse.dir
in hive-site.xml
, and the default value is
/user/hive/warehouse
. If that is not the case, either add the following to
jvm.config
on all of the nodes: -DHADOOP_USER_NAME=USER
, where USER
is an operating system user that has proper permissions for the Hive warehouse
directory, or start the SEP server as a user with similar permissions. The
hive
user generally works as USER
, since Hive is often started with the
hive
user. If you run into HDFS permissions problems on CREATE TABLE ...
AS
, remove /tmp/presto-*
on HDFS, fix the user as described above, then
restart all of the SEP servers.
Storage caching#
The connector supports the default storage caching. In addition, if HDFS Kerberos authentication is enabled in your catalog properties file with the following setting, caching takes the relevant permissions into account and operates accordingly:
hive.hdfs.authentication.type=KERBEROS
Additional configuration for Kerberos is required.
If HDFS Kerberos authentication is enabled, you can also enable user impersonation using:
hive.hdfs.impersonation.enabled=true
The service user assigned to SEP needs to be able to access data files in underlaying storage. Access permissions are checked against impersonated user, yet with caching in place, some read operations happen in context of system user.
Any access control defined with the integration of Apache Ranger or the Privacera platform is also enforced by the storage caching.
Amazon Glue support#
Statistics collection is supported for Hive Metastore and Amazon Glue.
Configuring and using SEP with AWS Glue is described in the AWS Glue support documentation section.
Cloudera compatibility matrix#
The Starburst Hive connector can query the Cloudera Data Platform (CDP), available as version 7.x. It also supports the predecessor Cloudera Distributed Hadoop (CDH) platform, available in versions 5.x and 6.x. Support and compatibility vary based on the version you use, and is detailed in the following table:
Cloudera version |
350-e |
345-e |
338-e |
332-e |
---|---|---|---|---|
CDP 7 |
Yes |
|||
CDH 6.x |
Yes |
Yes |
Yes |
Yes |
CDH 5.14+ |
Yes |
Yes |
Yes |
Yes |
CDH 5.13 |
Yes |
Yes |
Yes |
Yes |
CDP 5.12 and lower |
No |
No |
No |
No |
The following details apply for CDH 6.x users:
reading tables and data files created by CDH 6.x is supported
transactional table usage is not supported
CDH 6.x Hive cannot read ORC files created by SEP, due to the behavior of the included Hive version
using the included Apache Sentry is not supported
The following details apply for CDH 5.x users:
reading tables and data files created by CDH 5.x is supported
transactional table usage is not supported
Working with Hive views#
Hive views are defined in Hive QL and stored in the Hive Metastore Service. They analyzed to allow read access to the data.
The Hive connector includes support for reading Hive views with three different modes.
disabled
legacy
experimental
You can configure the behavior in your catalog properties file.
disabled
The default behavior is to ignore Hive views. This means that your business logic and data encoded in the views is not available in PrestoSQL.
legacy
A very simple implementation to execute Hive views, and therefore allow read
access to the data in PrestoSQL, can be enabled with
hive.translate-hive-views=true
and
hive.legacy-hive-view-translation=true
.
This legacy behavior interprets any Hive QL query that defines a view as if it is written in SQL. It does not do any translation, but instead relies on the fact that Hive QL is very similar to SQL.
This works for very simple Hive views, but can lead to problems for more complex queries. For example, if a Hive QL function has an identical signature but different behaviors to the SQL version, the returned results may differ. In more extreme cases the queries might fail, or not even be able to be parsed and executed.
experimental
The new behavior is better engineered, and has the potential to become a lot more powerful than the legacy implementation. It can analyze, process, and rewrite Hive views and contained expressions and statements.
It is considered an experimental feature and continues to change with each release. However it is already suitable for many use cases, and usage is encouraged.
You can enable the experimental behavior with
hive.translate-hive-views=true
.
Keep in mind that numerous features are not yet implemented when experimenting with this feature. The following is an incomplete list of missing functionality:
Hive QL
current_date
,current_timestamp
, and othersHive function calls including
translate()
, window functions and othersCommon table expressions and simple case expressions
Honor timestamp precision setting
Support all Hive data types and correct mapping to PrestoSQL types
Ability to process custom UDFs
Limitations#
The following limitation apply in addition to the limitations of the Hive Connector.
Writing to and creation of transactional tables is not supported.
Reading ORC ACID tables created with Hive Streaming ingest is not supported.
Hive 3 related limitations#
For security reasons,
sys
system catalog is not accessible in SEP.Hive’s
timestamp with local zone
data type is not supported in SEP. It is possible to read from a table having a column of this type, but the column itself will not be accessible. Writing to such a table is not supported.Prest does not correctly read
timestamp
values from Parquet, RCFile with binary serde and Avro file formats created by Hive 3.1 or later due to Hive issues HIVE-21002, HIVE-22167. When reading from these file formats, SEP returns different results.