Hive access control with Apache Sentry#

Apache Sentry is a granular, role-based authorization module for Hadoop. Sentry provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications.

Sentry can be used to control access to data accessible in Trino by any catalog using the Hive connector. Sentry and the Hive connector integration enforce the same and existing privileges granted on Hive objects. Trino enforces privileges assigned to Hive catalogs, schemas/databases, tables, columns, and views.

Note

Hive level security with Apache Sentry requires a valid Starburst Enterprise license.

Warning

The Hive access control with Apache Sentry is limited to usage with the Hive connector only. We suggest to replace it with the more powerful global access control with Apache Ranger. It is capable of securing catalogs using any connectors.

Prerequisites#

Before you configure Trino with Apache Sentry, verify the following prerequisites:

  • CDH 5.12+ with Apache Sentry and Hive installed.

  • Trino coordinator and workers have the appropriate network access to communicate with the Apache Sentry Service. Typically this is port 8038.

  • If LDAP is used for user to groups mapping, Trino coordinator and workers have the appropriate network access to communicate with the LDAP server. Typically this is port 636 or 389.

If you are new to Apache Sentry, Cloudera provides excellent documentation for installing and configuring Apache Sentry.

How it works#

When a query is submitted to Trino, Trino parses and analyzes the query to understand the privileges required by the user to access a particular object. Trino communicates with the Apache Sentry Service to determine, if the request is valid. If the request is valid, the query continues to execute. If the request is invalid, because the user does not have the necessary privileges to query an object, an error is returned to the user.

Group mapping#

Sentry manages role permissions and the roles to user groups associations. Sentry does not manage users to user groups associations. For this reason, any application using Sentry needs to be configured to be able to determine a user’s groups. In Trino, the sentry.group-mapping property specifies how the user groups are determined. By default it is set to HADOOP_DEFAULT.

Find more information in the documentation from Cloudera.

Note

It may be desired to reuse your existing sentry-site.xml configuration instead of setting new configurations in the Hive catalog. To have Trino use an XML configuration file, set sentry.config.resources to the file location of a sentry-site.xml configuration file.

When using HADOOP_DEFAULT group mapping and sentry.config.resources is set, and the provided file(s) contain a value for hadoop.security.group.mapping, the configured user group mapping is used. If you do not set sentry.config.resources Trino uses Hadoop’s default behavior, which is to retrieve user groups from the local operating system (i.e. user groups will be retrieved from operating system Trino coordinator is running on). Similarly, when using LDAP group mapping, and you provide Hadoop configuration files using sentry.config.resources property, you can abstain from setting LDAP group mapping properties in the Hive catalog.

Caching#

There is some latency associated with making the remote procedure calls to Apache Sentry, as well as syncing LDAP groups. To improve performance and reduce the number of requests to the Sentry service, Trino includes a caching mechanism so that subsequent calls can look at the cache before making the remote call.

See the properties table in this document for the cache properties along with their default values. Depending on your use case, you may want to increase or decrease the default TTL values.

ROLES in Trino#

When using Apache Sentry, setting a role makes that role active, and the user only has those privileges applied to that role. By default all assigned roles are active, and the user has the combined privileges of these roles.

See SET ROLE and SHOW ROLES for additional information.

The SHOW ROLES command requires the session user to be a Sentry admin user. This typically means that the user belongs to a group defined in sentry.service.admin.groups in sentry-site.xml on the Sentry/CDH configuration. Alternatively the user defined in``sentry.admin-user`` is used., if configured.

Configuring Trino with Apache Sentry#

Apache Sentry configuration#

As with Hive, Impala, Spark, and Hue, you must create an admin group for Trino named starburst. You can do this via the Cloudera Manager, or manually by adding to the property, sentry.service.admin.group in the sentry-site.xml file. The user of the Trino process should belong to this group. Additionally you must add the Trino user (from sentry.client-principal) to sentry.service.allow.connect in sentry-site.xml.

Trino configuration#

SEP must be configured to enable Trino to communicate with the Apache Sentry service. To enable set the following property in the Hive catalog:

hive.security=sentry

When sentry security is enabled, Trino enforces the same SQL-standard-based authorization as Hive does when Sentry is enabled for Hive. Once Apache Sentry is enabled, there are additional required and optional properties to configure.

Note

Trino does not support any modification of authorization policies in Sentry.

The following is a sample of a Hive catalog properties file that is configured to use Apache Sentry for authorization. It utilizes Kerberos for authentication and LDAP for group mapping.

connector.name=hive
hive.metastore.uri=thrift://hive-metastore-node:9083

hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-node@EXAMPLE.COM
hive.metastore.client.principal=hive/starburst-server-node@EXAMPLE.COM
hive.metastore.client.keytab=/etc/hive/conf/hive.keytab

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=false
hive.hdfs.trino.principal=hdfs/starburst-server-node@EXAMPLE.COM
hive.hdfs.trino.keytab=/etc/hadoop/conf/hdfs.keytab

hive.security=sentry

sentry.server=sentryserver
sentry.admin-user=hive
sentry.rpc-addresses=sentry-host-address
sentry.rpc-port=8038

sentry.authentication-type=KERBEROS

sentry.service-principal=sentry/sentry-node@EXAMPLE.COM
sentry.client-principal=starburst-server/starburst-server-node@EXAMPLE.COM
sentry.client-key-tab=/etc/starburst/conf/starburst-server.keytab

sentry.group-mapping=LDAP
sentry.ldap.url=ldaps://ldapserver/
sentry.ldap.user=cn=admin,dc=starburst,dc=example,dc=com
sentry.ldap.password=secret1234
sentry.ldap.search-base=dc=starburst,dc=example,dc=com
sentry.ldap.user-search-filter=(&(objectClass=inetOrgPerson)(uid={0}))
sentry.ldap.group-search-filter=(objectClass=groupOfNames)
sentry.ldap.group-member-attribute=member
sentry.ldap.group-name-attribute=cn

sentry.group-mapping.cache-ttl=10s

Accessing authorization information#

Sentry authorization information can be accessed by querying the following tables:

  • information_schema.roles - return information about all existing roles (equivalent of SHOW ROLES)

  • information_schema.applicable_roles - return roles that are granted to current user

  • information_schema.enabled_roles - return a list of roles that currently user is using at the moment (equivalent of SHOW CURRENT USER)

  • information_schema.table_privileges - return all tables privileges granted to user according to currently enabled roles

Troubleshooting#

  • If you get the exception GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed) then you need to make sure you are using proper sentry.service-principal.

  • If you get an SentryAccessDeniedException exception then make sure the user that you set for sentry.admin-user belongs to any group listed by sentry.service.admin.group in sentry-site.xml.

  • If Trino is not capable to connect to Kerberized Sentry and you get an exception Peer indicated failure: Problem with callback handler make sure that you added the Trino user (from sentry.client-principal) to sentry.service.allow.connect in sentry-site. Additionally, make sure the letter casing matches.

  • Make sure that your sentry.server value is correct. It is not an IP or Hostname. It is server object name in Sentry.

Configuration properties#

Sentry configuration properities#

Property

Description

Default

sentry.server

The name of the server object in Sentry that Trino uses to find authorization rules. This should be set to value of hive.sentry.server from Hive’s configuration XML files.

sentry.admin-user

Admin user of Apache Sentry that has ALL access to server object. It is a user that belongs to any group that are mentioned in sentry.service.admin.group property in sentry-site.xml Sentry service configuration file.

sentry.clean-privileges-on-drop.enabled

Flag to enable removing Sentry privileges on object drops from SEP. When set to false no user is required for sentry.admin-user, and it is therefore possible to configure Hive Sentry integration without an admin user. As a result SEP does not clean up Sentry privileges when schemas, tables, views or columns are dropped or renamed. In this case you need to make sure that Hive Metastore configuration includes the necessary privilege updates and removals and performs them automatically, or you need to update the metadata manually.

true

sentry.rpc-addresses

Address on which sentry RPC is available.

sentry.rpc-port

Port at which Sentry is listening.

sentry.authentication-type

Authentication method that will be used when connecting to Sentry service. Possible values are NONE or KERBEROS.

sentry.service-principal

Sentry service Kerberos principal that will be used to authenticate the Sentry service. This property is only used when sentry.authentication-type=KERBEROS.

sentry.client-principal

Sentry client Kerberos principal that will be used to authenticate the client when connecting to Sentry service. The primary part of this principal (user) should be included in sentry.service.allow.connect property in sentry-site.xm' Sentry service configuration file. This property is only used when sentry.authentication-type=KERBEROS.

sentry.client-key-tab

Sentry client Kerberos keytab file location that will be used to authenticate the client when connecting to to Sentry service. This property is only used when sentry.authentication-type=KERBEROS.

sentry.cache-ttl

Period where information returned by Sentry will be cached in Trino. 0ms disables the cache.

1m

sentry.group-mapping

Defines the way how user group are determined. Possible values are:

  • HADOOP_DEFAULT user groups are retrieved from the operating system that Trino is running on. You may want to use sentry.config. resources to customize this behaviour.

  • SYSTEM user groups will be retrieved from operating system that Trino is running on

  • LDAP user groups will be retrieved from LDAP.

sentry.ldap.url

Address of LDAP service when sentry.group-mapping==LDAP.

sentry.ldap.user

LDAP user name when sentry.group-mapping==LDAP.

sentry.ldap.password

LDAP user password when sentry.group-mapping==LDAP.

sentry.ldap.search-base

Configures the search base for the LDAP connection when sentry.group-mapping==LDAP.

sentry.ldap.user-search-filter

Additional filters to apply when when searching for users when sentry.group-mapping==LDAP.

sentry.ldap.group-search-filter

Additional filters to apply when finding relevant groups when sentry.group-mapping==LDAP.

sentry.ldap.group-member-attribute

LDAP attribute to use for determining group membership when sentry.group-mapping==LDAP.

sentry.ldap.group-name-attribute

LDAP attribute to use for identifying a group’s name when sentry.group-mapping==LDAP.

sentry.group-mapping.cache-ttl

Period where group mapping information will be cached in Trino. 0ms disables the cache.

1min

sentry.group-mapping.negative-cache-ttl

Period where information about empty group will be cached in Trino. 0ms disables the cache.

sentry.config.resources

Additional XML configuration files which will be read before applying Trino Sentry configuration. Useful for reusing existing sentry-site.xml configuration files.

sentry.catalog-session-properties.skip-authorization-check

Skip authorization check when setting catalog session properties.

false

Limitations#

Trino only enforces the Apache Sentry policies. Trino does not support any modification of authorization policies in Sentry. This includes commands like CREATE ROLE, GRANT, or REVOKE. If you need to modify the roles and privileges, that must be done via another tool such as Apache Hive or Hue.

Sentry Policy Files are also not supported.