Hive access control with Apache Sentry#
Apache Sentry is a granular, role-based authorization module for Hadoop. Sentry provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications.
Sentry can be used to control access to data accessible in Trino by any catalog using the Hive connector. Sentry and the Hive connector integration enforce the same and existing privileges granted on Hive objects. Trino enforces privileges assigned to Hive catalogs, schemas/databases, tables, columns, and views.
Note
Hive level security with Apache Sentry requires a valid Starburst Enterprise license.
Warning
The Hive access control with Apache Sentry is limited to usage with the Hive connector only. We suggest to replace it with the more powerful global access control with Apache Ranger. It is capable of securing catalogs using any connectors.
Prerequisites#
Before you configure Trino with Apache Sentry, verify the following prerequisites:
CDH 5.12+ with Apache Sentry and Hive installed.
Trino coordinator and workers have the appropriate network access to communicate with the Apache Sentry Service. Typically this is port
8038
.If LDAP is used for user to groups mapping, Trino coordinator and workers have the appropriate network access to communicate with the LDAP server. Typically this is port
636
or389
.
If you are new to Apache Sentry, Cloudera provides excellent documentation for installing and configuring Apache Sentry.
How it works#
When a query is submitted to Trino, Trino parses and analyzes the query to understand the privileges required by the user to access a particular object. Trino communicates with the Apache Sentry Service to determine, if the request is valid. If the request is valid, the query continues to execute. If the request is invalid, because the user does not have the necessary privileges to query an object, an error is returned to the user.
Group mapping#
Sentry manages role permissions and the roles to user groups associations.
Sentry does not manage users to user groups associations. For this reason, any
application using Sentry needs to be configured to be able to determine a user’s
groups. In Trino, the sentry.group-mapping
property specifies how the user
groups are determined. By default it is set to HADOOP_DEFAULT
.
Find more information in the documentation from Cloudera.
Note
It may be desired to reuse your existing sentry-site.xml
configuration
instead of setting new configurations in the Hive catalog. To have Trino
use an XML configuration file, set sentry.config.resources
to the file
location of a sentry-site.xml
configuration file.
When using HADOOP_DEFAULT
group mapping and sentry.config.resources
is set, and the provided file(s) contain a value for
hadoop.security.group.mapping
, the configured user group mapping is
used. If you do not set sentry.config.resources Trino uses Hadoop’s default
behavior, which is to retrieve user groups from the local operating system
(i.e. user groups will be retrieved from operating system Trino coordinator
is running on). Similarly, when using LDAP
group mapping, and you
provide Hadoop configuration files using sentry.config.resources
property, you can abstain from setting LDAP group mapping properties in the
Hive catalog.
Caching#
There is some latency associated with making the remote procedure calls to Apache Sentry, as well as syncing LDAP groups. To improve performance and reduce the number of requests to the Sentry service, Trino includes a caching mechanism so that subsequent calls can look at the cache before making the remote call.
See the properties table in this document for the cache properties along with their default values. Depending on your use case, you may want to increase or decrease the default TTL values.
ROLES in Trino#
When using Apache Sentry, setting a role makes that role active, and the user only has those privileges applied to that role. By default all assigned roles are active, and the user has the combined privileges of these roles.
See SET ROLE and SHOW ROLES for additional information.
The SHOW ROLES
command requires the session user to be a Sentry admin user.
This typically means that the user belongs to a group defined in
sentry.service.admin.groups
in sentry-site.xml
on the Sentry/CDH
configuration. Alternatively the user defined in``sentry.admin-user`` is used.,
if configured.
Configuring Trino with Apache Sentry#
Apache Sentry configuration#
As with Hive, Impala, Spark, and Hue, you must create an admin group for Trino
named starburst
. You can do this via the Cloudera Manager, or manually by
adding to the property, sentry.service.admin.group
in the
sentry-site.xml
file. The user of the Trino process should belong to this
group. Additionally you must add the Trino user (from
sentry.client-principal
) to sentry.service.allow.connect
in
sentry-site.xml
.
Trino configuration#
SEP must be configured to enable Trino to communicate with the Apache Sentry service. To enable set the following property in the Hive catalog:
hive.security=sentry
When sentry
security is enabled, Trino enforces the same SQL-standard-based
authorization as Hive does when Sentry is enabled for Hive. Once Apache Sentry
is enabled, there are additional required and optional properties to
configure.
Note
Trino does not support any modification of authorization policies in Sentry.
The following is a sample of a Hive catalog properties file that is configured to use Apache Sentry for authorization. It utilizes Kerberos for authentication and LDAP for group mapping.
connector.name=hive
hive.metastore.uri=thrift://hive-metastore-node:9083
hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-node@EXAMPLE.COM
hive.metastore.client.principal=hive/starburst-server-node@EXAMPLE.COM
hive.metastore.client.keytab=/etc/hive/conf/hive.keytab
hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=false
hive.hdfs.trino.principal=hdfs/starburst-server-node@EXAMPLE.COM
hive.hdfs.trino.keytab=/etc/hadoop/conf/hdfs.keytab
hive.security=sentry
sentry.server=sentryserver
sentry.admin-user=hive
sentry.rpc-addresses=sentry-host-address
sentry.rpc-port=8038
sentry.authentication-type=KERBEROS
sentry.service-principal=sentry/sentry-node@EXAMPLE.COM
sentry.client-principal=starburst-server/starburst-server-node@EXAMPLE.COM
sentry.client-key-tab=/etc/starburst/conf/starburst-server.keytab
sentry.group-mapping=LDAP
sentry.ldap.url=ldaps://ldapserver/
sentry.ldap.user=cn=admin,dc=starburst,dc=example,dc=com
sentry.ldap.password=secret1234
sentry.ldap.search-base=dc=starburst,dc=example,dc=com
sentry.ldap.user-search-filter=(&(objectClass=inetOrgPerson)(uid={0}))
sentry.ldap.group-search-filter=(objectClass=groupOfNames)
sentry.ldap.group-member-attribute=member
sentry.ldap.group-name-attribute=cn
sentry.group-mapping.cache-ttl=10s
Accessing authorization information#
Sentry authorization information can be accessed by querying the following tables:
information_schema.roles
- return information about all existing roles (equivalent ofSHOW ROLES
)information_schema.applicable_roles
- return roles that are granted to current userinformation_schema.enabled_roles
- return a list of roles that currently user is using at the moment (equivalent ofSHOW CURRENT USER
)information_schema.table_privileges
- return all tables privileges granted to user according to currently enabled roles
Troubleshooting#
If you get the exception
GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)
then you need to make sure you are using propersentry.service-principal
.If you get an
SentryAccessDeniedException
exception then make sure the user that you set forsentry.admin-user
belongs to any group listed bysentry.service.admin.group
insentry-site.xml
.If Trino is not capable to connect to Kerberized Sentry and you get an exception
Peer indicated failure: Problem with callback handler
make sure that you added the Trino user (fromsentry.client-principal
) tosentry.service.allow.connect
insentry-site
. Additionally, make sure the letter casing matches.Make sure that your
sentry.server
value is correct. It is not an IP or Hostname. It is server object name in Sentry.
Configuration properties#
Property |
Description |
Default |
---|---|---|
|
The name of the server object in Sentry that Trino uses to find
authorization rules. This should be set to value of |
|
|
Admin user of Apache Sentry that has |
|
|
Flag to enable removing Sentry privileges on object drops from SEP.
When set to |
|
|
Address on which sentry RPC is available. |
|
|
Port at which Sentry is listening. |
|
|
Authentication method that will be used when connecting to Sentry service.
Possible values are |
|
|
Sentry service Kerberos principal that will be used to authenticate the
Sentry service. This property is only used when
|
|
|
Sentry client Kerberos principal that will be used to authenticate the
client when connecting to Sentry service. The primary part of this
principal (user) should be included in |
|
|
Sentry client Kerberos keytab file location that will be used to
authenticate the client when connecting to to Sentry service. This
property is only used when |
|
|
Period where information returned by Sentry will be cached in Trino.
|
|
|
Defines the way how user group are determined. Possible values are:
|
|
|
Address of LDAP service when |
|
|
LDAP user name when |
|
|
LDAP user password when |
|
|
Configures the search base for the LDAP connection when
|
|
|
Additional filters to apply when when searching for users when
|
|
|
Additional filters to apply when finding relevant groups when
|
|
|
LDAP attribute to use for determining group membership when
|
|
|
LDAP attribute to use for identifying a group’s name when
|
|
|
Period where group mapping information will be cached in Trino. |
|
|
Period where information about empty group will be cached in Trino.
|
|
|
Additional XML configuration files which will be read before applying
Trino Sentry configuration. Useful for reusing existing
|
|
|
Skip authorization check when setting catalog session properties. |
|
Limitations#
Trino only enforces the Apache Sentry policies. Trino does not support any
modification of authorization policies in Sentry. This includes commands like
CREATE ROLE
, GRANT
, or REVOKE
. If you need to modify the roles and
privileges, that must be done via another tool such as Apache Hive or Hue.
Sentry Policy Files are also not supported.