Hive access control with Apache Sentry#
Apache Sentry is a granular, role-based authorization module for Hadoop. Sentry provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications.
Sentry can be used to control access to data accessible in PrestoSQL by any catalog using the Hive connector. Sentry and the Hive connector integration enforce the same and existing privileges granted on Hive objects. PrestoSQL enforces privileges assigned to Hive catalogs, schemas/databases, tables, columns, and views.
Note
Hive level security with Apache Sentry requires a valid Starburst Enterprise license.
Warning
The Hive access control with Apache Sentry is limited to usage with the Hive connector only. We suggest to replace it with the more powerful global access control with Apache Ranger. It is capable of securing catalogs using any connectors.
Prerequisites#
Before you configure PrestoSQL with Apache Sentry, verify the following prerequisites:
CDH 5.12+ with Apache Sentry and Hive installed. CDH 6.x is not supported.
PrestoSQL coordinator and workers have the appropriate network access to communicate with the Apache Sentry Service. Typically this is port
8038
.If LDAP is used for user to groups mapping, PrestoSQL coordinator and workers have the appropriate network access to communicate with the LDAP server. Typically this is port
636
or389
.
If you are new to Apache Sentry, Cloudera provides excellent documentation for installing and configuring Apache Sentry.
How it works#
When a query is submitted to PrestoSQL, PrestoSQL parses and analyzes the query to understand the privileges required by the user to access a particular object. PrestoSQL communicates with the Apache Sentry Service to determine, if the request is valid. If the request is valid, the query continues to execute. If the request is invalid, because the user does not have the necessary privileges to query an object, an error is returned to the user.
Group mapping#
Sentry manages role permissions and the roles to user groups associations.
Sentry does not manage users to user groups associations. For this reason, any
application using Sentry needs to be configured to be able to determine a user’s
groups. In PrestoSQL, the sentry.group-mapping
property specifies how the user
groups are determined. By default it is set to HADOOP_DEFAULT
.
Find more information in the documentation from Cloudera.
Note
It may be desired to reuse your existing sentry-site.xml
configuration
instead of setting new configurations in the Hive catalog. To have PrestoSQL
use an XML configuration file, set sentry.config.resources
to the file
location of a sentry-site.xml
configuration file.
When using HADOOP_DEFAULT
group mapping and sentry.config.resources
is set, and the provided file(s) contain a value for
hadoop.security.group.mapping
, the configured user group mapping is
used. If you do not set sentry.config.resources
PrestoSQL uses Hadoop’s
default behavior, which is to retrieve user groups from the local operating
system. Similarly, when using LDAP
group mapping, and you provide Hadoop
configuration files using sentry.config.resources
property, you can
abstain from setting LDAP group mapping properties in the Hive catalog.
Caching#
There is some latency associated with making the remote procedure calls to Apache Sentry, as well as syncing LDAP groups. To improve performance and reduce the number of requests to the Sentry service, PrestoSQL includes a caching mechanism so that subsequent calls can look at the cache before making the remote call.
See the properties table in this document for the cache properties along with their default values. Depending on your use case, you may want to increase or decrease the default TTL values.
ROLES in PrestoSQL#
When using Apache Sentry, setting a role makes that role active, and the user only has those privileges applied to that role. By default all assigned roles are active, and the user has the combined privileges of these roles.
See SET ROLE and SHOW ROLES for additional information.
Configuring PrestoSQL with Apache Sentry#
Apache Sentry configuration#
As with Hive, Impala, Spark, and Hue, you must create an admin group for PrestoSQL
named presto
. You can do this via the Cloudera Manager, or manually by
adding to the property, sentry.service.admin.group
in the
sentry-site.xml
file. The user of the PrestoSQL process should belong to this
group. Additionally you must add the PrestoSQL user (from
sentry.client-principal
) to sentry.service.allow.connect
in
sentry-site.xml
.
PrestoSQL configuration#
SEP must be configured to enable PrestoSQL to communicate with the Apache Sentry service. To enable set the following property in the Hive catalog:
hive.security=sentry
When sentry
security is enabled, PrestoSQL enforces the same SQL-standard-based
authorization as Hive does when Sentry is enabled for Hive. Once Apache Sentry
is enabled, there are additional required and optional properties to
configure.
Note
PrestoSQL does not support any modification of authorization policies in Sentry.
The following is a sample of a Hive catalog properties file that is configured to use Apache Sentry for authorization. It utilizes Kerberos for authentication and LDAP for group mapping.
connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore-node:9083
hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-node@EXAMPLE.COM
hive.metastore.client.principal=hive/presto-server-node@EXAMPLE.COM
hive.metastore.client.keytab=/etc/hive/conf/hive.keytab
hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=false
hive.hdfs.presto.principal=hdfs/presto-server-node@EXAMPLE.COM
hive.hdfs.presto.keytab=/etc/hadoop/conf/hdfs.keytab
hive.security=sentry
sentry.server=sentryserver
sentry.admin-user=hive
sentry.rpc-addresses=sentry-host-address
sentry.rpc-port=8038
sentry.authentication-type=KERBEROS
sentry.service-principal=sentry/sentry-node@EXAMPLE.COM
sentry.client-principal=presto-server/presto-server-node@EXAMPLE.COM
sentry.client-key-tab=/etc/presto/conf/presto-server.keytab
sentry.group-mapping=LDAP
sentry.ldap.url=ldaps://ldapserver/
sentry.ldap.user=cn=admin,dc=presto,dc=example,dc=com
sentry.ldap.password=secret1234
sentry.ldap.search-base=dc=presto,dc=example,dc=com
sentry.ldap.user-search-filter=(&(objectClass=inetOrgPerson)(uid={0}))
sentry.ldap.group-search-filter=(objectClass=groupOfNames)
sentry.ldap.group-member-attribute=member
sentry.ldap.group-name-attribute=cn
sentry.group-mapping.cache-ttl=10s
Accessing authorization information#
Sentry authorization information can be accessed by querying the following tables:
information_schema.roles
- return information about all existing roles (equivalent ofSHOW ROLES
)information_schema.applicable_roles
- return roles that are granted to current userinformation_schema.enabled_roles
- return a list of roles that currently user is using at the moment (equivalent ofSHOW CURRENT USER
)information_schema.table_privileges
- return all tables privileges granted to user according to currently enabled roles
Troubleshooting#
If you get the exception
GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)
then you need to make sure you are using propersentry.service-principal
.If you get an
SentryAccessDeniedException
exception then make sure the user that you set forsentry.admin-user
belongs to any group listed bysentry.service.admin.group
insentry-site.xml
.If PrestoSQL is not capable to connect to Kerberized Sentry and you get an exception
Peer indicated failure: Problem with callback handler
make sure that you added the PrestoSQL user (fromsentry.client-principal
) tosentry.service.allow.connect
insentry-site
. Additionally, make sure the letter casing matches.Make sure that your
sentry.server
value is correct. It is not an IP or Hostname. It is server object name in Sentry.
Configuration properties#
Property |
Description |
Default |
---|---|---|
|
The name of the server object in Sentry that PrestoSQL uses to find
authorization rules. This should be set to value of |
|
|
Admin user of Apache Sentry that has |
|
|
Address on which sentry RPC is available. |
|
|
Port at which Sentry is listening. |
|
|
Authentication method that will be used when connecting to Sentry service.
Possible values are |
|
|
Sentry service Kerberos principal that will be used to authenticate the
Sentry service. This property is only used when
|
|
|
Sentry client Kerberos principal that will be used to authenticate the
client when connecting to Sentry service. The primary part of this
principal (user) should be included in |
|
|
Sentry client Kerberos keytab file location that will be used to
authenticate the client when connecting to to Sentry service. This
property is only used when |
|
|
Period where information returned by Sentry will be cached in PrestoSQL.
|
|
|
Defines the way how user group are determined. Possible values are:
|
|
|
Address of LDAP service when |
|
|
LDAP user name when |
|
|
LDAP user password when |
|
|
Configures the search base for the LDAP connection when
|
|
|
Additional filters to apply when when searching for users when
|
|
|
Additional filters to apply when finding relevant groups when
|
|
|
LDAP attribute to use for determining group membership when
|
|
|
LDAP attribute to use for identifying a group’s name when
|
|
|
Period where group mapping information will be cached in PrestoSQL. |
|
|
Period where information about empty group will be cached in PrestoSQL.
|
|
|
Additional XML configuration files which will be read before applying
PrestoSQL Sentry configuration. Useful for reusing existing
|
Limitations#
PrestoSQL only enforces the Apache Sentry policies. PrestoSQL does not support any
modification of authorization policies in Sentry. This includes commands like
CREATE ROLE
, GRANT
, or REVOKE
. If you need to modify the roles and
privileges, that must be done via another tool such as Apache Hive or Hue.
Sentry Policy Files are also not supported.