Hive access control with Apache Sentry#

Apache Sentry is a granular, role-based authorization module for Hadoop. Sentry provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications.

Sentry can be used to control access to data accessible in PrestoSQL by any catalog using the Hive connector. Sentry and the Hive connector integration enforce the same and existing privileges granted on Hive objects. PrestoSQL enforces privileges assigned to Hive catalogs, schemas/databases, tables, columns, and views.

Note

Hive level security with Apache Sentry requires a valid Starburst Enterprise license.

Warning

The Hive access control with Apache Sentry is limited to usage with the Hive connector only. We suggest to replace it with the more powerful global access control with Apache Ranger. It is capable of securing catalogs using any connectors.

Prerequisites#

Before you configure PrestoSQL with Apache Sentry, verify the following prerequisites:

  • CDH 5.12+ with Apache Sentry and Hive installed. CDH 6.x is not supported.

  • PrestoSQL coordinator and workers have the appropriate network access to communicate with the Apache Sentry Service. Typically this is port 8038.

  • If LDAP is used for user to groups mapping, PrestoSQL coordinator and workers have the appropriate network access to communicate with the LDAP server. Typically this is port 636 or 389.

If you are new to Apache Sentry, Cloudera provides excellent documentation for installing and configuring Apache Sentry.

How it works#

When a query is submitted to PrestoSQL, PrestoSQL parses and analyzes the query to understand the privileges required by the user to access a particular object. PrestoSQL communicates with the Apache Sentry Service to determine, if the request is valid. If the request is valid, the query continues to execute. If the request is invalid, because the user does not have the necessary privileges to query an object, an error is returned to the user.

Group mapping#

Sentry manages role permissions and the roles to user groups associations. Sentry does not manage users to user groups associations. For this reason, any application using Sentry needs to be configured to be able to determine a user’s groups. In PrestoSQL, the sentry.group-mapping property specifies how the user groups are determined. By default it is set to HADOOP_DEFAULT.

Find more information in the documentation from Cloudera.

Note

It may be desired to reuse your existing sentry-site.xml configuration instead of setting new configurations in the Hive catalog. To have PrestoSQL use an XML configuration file, set sentry.config.resources to the file location of a sentry-site.xml configuration file.

When using HADOOP_DEFAULT group mapping and sentry.config.resources is set, and the provided file(s) contain a value for hadoop.security.group.mapping, the configured user group mapping is used. If you do not set sentry.config.resources PrestoSQL uses Hadoop’s default behavior, which is to retrieve user groups from the local operating system. Similarly, when using LDAP group mapping, and you provide Hadoop configuration files using sentry.config.resources property, you can abstain from setting LDAP group mapping properties in the Hive catalog.

Caching#

There is some latency associated with making the remote procedure calls to Apache Sentry, as well as syncing LDAP groups. To improve performance and reduce the number of requests to the Sentry service, PrestoSQL includes a caching mechanism so that subsequent calls can look at the cache before making the remote call.

See the properties table in this document for the cache properties along with their default values. Depending on your use case, you may want to increase or decrease the default TTL values.

ROLES in PrestoSQL#

When using Apache Sentry, setting a role makes that role active, and the user only has those privileges applied to that role. By default all assigned roles are active, and the user has the combined privileges of these roles.

See SET ROLE and SHOW ROLES for additional information.

Configuring PrestoSQL with Apache Sentry#

Apache Sentry configuration#

As with Hive, Impala, Spark, and Hue, you must create an admin group for PrestoSQL named presto. You can do this via the Cloudera Manager, or manually by adding to the property, sentry.service.admin.group in the sentry-site.xml file. The user of the PrestoSQL process should belong to this group. Additionally you must add the PrestoSQL user (from sentry.client-principal) to sentry.service.allow.connect in sentry-site.xml.

PrestoSQL configuration#

SEP must be configured to enable PrestoSQL to communicate with the Apache Sentry service. To enable set the following property in the Hive catalog:

hive.security=sentry

When sentry security is enabled, PrestoSQL enforces the same SQL-standard-based authorization as Hive does when Sentry is enabled for Hive. Once Apache Sentry is enabled, there are additional required and optional properties to configure.

Note

PrestoSQL does not support any modification of authorization policies in Sentry.

The following is a sample of a Hive catalog properties file that is configured to use Apache Sentry for authorization. It utilizes Kerberos for authentication and LDAP for group mapping.

connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore-node:9083

hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-node@EXAMPLE.COM
hive.metastore.client.principal=hive/presto-server-node@EXAMPLE.COM
hive.metastore.client.keytab=/etc/hive/conf/hive.keytab

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=false
hive.hdfs.presto.principal=hdfs/presto-server-node@EXAMPLE.COM
hive.hdfs.presto.keytab=/etc/hadoop/conf/hdfs.keytab

hive.security=sentry

sentry.server=sentryserver
sentry.admin-user=hive
sentry.rpc-addresses=sentry-host-address
sentry.rpc-port=8038

sentry.authentication-type=KERBEROS

sentry.service-principal=sentry/sentry-node@EXAMPLE.COM
sentry.client-principal=presto-server/presto-server-node@EXAMPLE.COM
sentry.client-key-tab=/etc/presto/conf/presto-server.keytab

sentry.group-mapping=LDAP
sentry.ldap.url=ldaps://ldapserver/
sentry.ldap.user=cn=admin,dc=presto,dc=example,dc=com
sentry.ldap.password=secret1234
sentry.ldap.search-base=dc=presto,dc=example,dc=com
sentry.ldap.user-search-filter=(&(objectClass=inetOrgPerson)(uid={0}))
sentry.ldap.group-search-filter=(objectClass=groupOfNames)
sentry.ldap.group-member-attribute=member
sentry.ldap.group-name-attribute=cn

sentry.group-mapping.cache-ttl=10s

Accessing authorization information#

Sentry authorization information can be accessed by querying the following tables:

  • information_schema.roles - return information about all existing roles (equivalent of SHOW ROLES)

  • information_schema.applicable_roles - return roles that are granted to current user

  • information_schema.enabled_roles - return a list of roles that currently user is using at the moment (equivalent of SHOW CURRENT USER)

  • information_schema.table_privileges - return all tables privileges granted to user according to currently enabled roles

Troubleshooting#

  • If you get the exception GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed) then you need to make sure you are using proper sentry.service-principal.

  • If you get an SentryAccessDeniedException exception then make sure the user that you set for sentry.admin-user belongs to any group listed by sentry.service.admin.group in sentry-site.xml.

  • If PrestoSQL is not capable to connect to Kerberized Sentry and you get an exception Peer indicated failure: Problem with callback handler make sure that you added the PrestoSQL user (from sentry.client-principal) to sentry.service.allow.connect in sentry-site. Additionally, make sure the letter casing matches.

  • Make sure that your sentry.server value is correct. It is not an IP or Hostname. It is server object name in Sentry.

Configuration properties#

Sentry configuration properities#

Property

Description

Default

sentry.server

The name of the server object in Sentry that PrestoSQL uses to find authorization rules. This should be set to value of hive.sentry.server from Hive’s configuration XML files.

sentry.admin-user

Admin user of Apache Sentry that has ALL access to server object. It is a user that belongs to any group that are mentioned in sentry.service.admin.group property in sentry-site.xml Sentry service configuration file.

sentry.rpc-addresses

Address on which sentry RPC is available.

sentry.rpc-port

Port at which Sentry is listening.

sentry.authentication-type

Authentication method that will be used when connecting to Sentry service. Possible values are NONE or KERBEROS.

sentry.service-principal

Sentry service Kerberos principal that will be used to authenticate the Sentry service. This property is only used when sentry.authentication-type=KERBEROS.

sentry.client-principal

Sentry client Kerberos principal that will be used to authenticate the client when connecting to Sentry service. The primary part of this principal (user) should be included in sentry.service.allow.connect property in sentry-site.xm' Sentry service configuration file. This property is only used when sentry.authentication-type=KERBEROS.

sentry.client-key-tab

Sentry client Kerberos keytab file location that will be used to authenticate the client when connecting to to Sentry service. This property is only used when sentry.authentication-type=KERBEROS.

sentry.cache-ttl

Period where information returned by Sentry will be cached in PrestoSQL. 0ms disables the cache.

1m

sentry.group-mapping

Defines the way how user group are determined. Possible values are:

  • HADOOP_DEFAULT user groups will be retrieved from hadoop client library. You may want to use sentry.config.resources to customize this behaviour.

  • SYSTEM user groups will be retrieved from operating system that PrestoSQL is running on

  • LDAP user groups will be retrieved from LDAP.

sentry.ldap.url

Address of LDAP service when sentry.group-mapping==LDAP.

sentry.ldap.user

LDAP user name when sentry.group-mapping==LDAP.

sentry.ldap.password

LDAP user password when sentry.group-mapping==LDAP.

sentry.ldap.search-base

Configures the search base for the LDAP connection when sentry.group-mapping==LDAP.

sentry.ldap.user-search-filter

Additional filters to apply when when searching for users when sentry.group-mapping==LDAP.

sentry.ldap.group-search-filter

Additional filters to apply when finding relevant groups when sentry.group-mapping==LDAP.

sentry.ldap.group-member-attribute

LDAP attribute to use for determining group membership when sentry.group-mapping==LDAP.

sentry.ldap.group-name-attribute

LDAP attribute to use for identifying a group’s name when sentry.group-mapping==LDAP.

sentry.group-mapping.cache-ttl

Period where group mapping information will be cached in PrestoSQL. 0ms disables the cache.

1min

sentry.group-mapping.negative-cache-ttl

Period where information about empty group will be cached in PrestoSQL. 0ms disables the cache.

sentry.config.resources

Additional XML configuration files which will be read before applying PrestoSQL Sentry configuration. Useful for reusing existing sentry-site.xml configuration files.

Limitations#

PrestoSQL only enforces the Apache Sentry policies. PrestoSQL does not support any modification of authorization policies in Sentry. This includes commands like CREATE ROLE, GRANT, or REVOKE. If you need to modify the roles and privileges, that must be done via another tool such as Apache Hive or Hue.

Sentry Policy Files are also not supported.