AWS Lake Formation access control support#
Starburst Enterprise platform (SEP) provides support for using an existing AWS Lake Formation access control system.
Requirements#
To use AWS Lake Formation integration with Starburst Enterprise, you need:
An existing AWS Lake Formation configuration and AWS credentials that allow interacting with its API.
A valid Starburst Enterprise license.
Overview#
AWS Lake Formation provides a single place to manage access controls policies. You can define security policies that restrict access to data at database, table, column, row and cell levels.
AWS Lake Formation access control support is only available for catalogs that use the Iceberg connector or the Hive connector, which utilizes HDFS file system support.
AWS Lake Formation can be enabled alongside built-in access control as long as the two security systems are securing mutually exclusive entities. Using both AWS Lake Formation and built-in access control for authorization on the same catalog is not supported.
Configure AWS Lake Formation#
Each catalog that needs to be controlled with AWS Lake Formation must have the
catalog properties file configured to use lake-formation
security.
For Hive:
hive.security=lake-formation
For Iceberg:
iceberg.security=lake-formation
The following is a more complex example of a catalog properties file that is configured to use AWS Lake Formation for authorization.
For the Hive connector:
connector.name=hive
hive.security=lake-formation
hive.metastore=glue
hive.metastore.glue.region=us-east-2
hive.metastore.glue.default-warehouse-dir=s3://data-lake-bucket
hive.metastore.glue.iam-role=arn:aws:iam::<account_id>:role/<admin-role>
lake-formation.authorized-caller-tag=starburst-enterprise
lake-formation.security-mapping.config-file=etc/lakeformation-security-mapping.json
For the Iceberg connector:
connector.name=iceberg
iceberg.security=lake-formation
iceberg.catalog.type=glue
hive.metastore.glue.region=us-east-2
hive.metastore.glue.sts.region=us-east-2
hive.metastore.glue.default-warehouse-dir=s3://data-lake-bucket
hive.metastore.glue.catalogid=<account_id>
hive.metastore.glue.iam-role=arn:aws:iam::<account_id>:role/<admin-role>
lake-formation.authorized-caller-tag=starburst-enterprise
lake-formation.security-mapping.config-file=etc/lakeformation-security-mapping.json
For Lake Formation access control to work with the Iceberg connector
iceberg.glue.cache-table-metadata
must be set to true
.
More information on lake formation security mapping can be found later in this topic.
Lake Formation data lake locations#
Depending on your intentions with AWS Lake Formation, you may need to register some number of S3 locations:
To run
INSERT
statements on a table, its S3 location must be registered in Lake Formation.When credential vending is enabled, all S3 locations used must be registered in Lake Formation.
You can then use AWS Lake Formation permissions for access control to objects that point to your data lake location, and to the underlying data in the location.
Warning
Do not use the AWSServiceRoleForLakeFormationDataAccess
service-linked
role for registering data locations.
Trust relationships of the IAM role registered for a location in AWS Lake Formation must allow the Lake Formation AWS service to assume the role. Below is an example of a Trust relationships statement for the data access role.
{
"Effect": "Allow",
"Principal": {
"Service": [
"lakeformation.amazonaws.com",
"glue.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
The IAM role registered for a location in AWS Lake Formation should have at
least the following S3 API actions granted for all S3 paths that it is
registered for: s3:ListBucket
, s3::GetObject
, s3::PutObject
,
s3::DeleteObject
.
For more information on registering data locations in AWS Lake Formation, refer to Register data lake AWS documentation.
Configuration properties#
Property |
Description |
---|---|
|
The value of |
|
The maximum number of concurrent connections to the AWS Lake Formation
client. Defaults to |
|
Maximum number of error retries for the AWS Lake Formation client. Defaults
to |
|
AWS region of the data catalog that AWS Lake Formation is securing. Must be configured when not running in EC2, or when the catalog is in a different region. |
|
Set AWS Lake Formation API requests to the same region as the EC2 instance
where SEP is running. Defaults to |
|
(Optional) URL for an AWS Lake Formation API endpoint URL, such as
|
In order for SEP to integrate with AWS Lake Formation, you must add a
session tag for SEP to AWS Lake Formation external data filtering. The
session tag value added to AWS Lake Formation is set in the
lake-formation.authorized-caller-tag
property. Permission checking for SEP
fails for AWS Lake Formation if this property is not set. Read the AWS
documentation
for details on how to complete this configuration.
To use Lake Formation access control when accessing resources shared between different AWS accounts, you need the following prerequisites:
For AWS accounts that are sharing resources:
Lake Formation external data filtering must be enabled.
One of the allowed session tag values must match what is configured in SEP.
IDs of the accounts that are accessing resources must be listed in Lake Formation external data filtering settings, under AWS account IDs.
For AWS accounts that are accessing shared resources:
Lake Formation external data filtering must be enabled.
One of the allowed session tag values must match what is configured in SEP.
Access to S3 in the source accounts must be configured separately using S3 security mapping.
Write operations on Lake Formation resource links are not supported.
Lake formation credential vending integration#
SEP supports AWS credential vending with Lake Formation. SEP calls Lake Formation credential vending API operations to generate temporary credentials to determine read access to the table. S3 locations that are registered with Lake Formation can only be accessed using the role specified at the time of registering the location.
To enable AWS credential vending, add the
lake-formation.credential-vending.enabled=true
catalog configuration
property to your hive.properties
configuration file.
Note
Enabling credential vending makes the catalog read-only.
AWS Lake Formation works only with the native filesystem, therefore the
fs.native-s3.enabled
property must be set to true
. When a catalog
uses credential vending, Hive S3 configuration properties are made invalid.
The following properties must be used instead:
Property |
Description |
---|---|
|
Optional property to force the S3 client to connect to the specified region only. |
|
The S3 storage endpoint server. This can be used to connect to an
S3-compatible storage system instead of AWS. When using v4 signatures, it
is recommended to set this to the AWS region-specific endpoint, such as
|
|
Use path-style access for all requests to the S3-compatible storage. This
is for S3-compatible storage that doesn’t support virtual-hosted-style
access. Defaults to |
|
Maximum number of simultaneous open connections to S3. |
|
Proxy protocol. HTTPS. |
|
Proxy protocol. HTTP. |
|
The part size for S3 streaming upload. Defaults to |
|
Enables Requester Pays. |
|
The type of key management for S3 server-side encryption. Use |
|
If set, use S3 client-side encryption and use the AWS KMS to store encryption keys and use the value of this property as the KMS Key ID for newly created objects. |
Credential vending configuration properties#
The following additional catalog configuration properties are available for credential vending:
Property |
Description |
---|---|
|
Specifies the length of time in which the generated temporary credentials
are valid. Duration can be set between |
|
Specifies the length of time until temporary credentials are marked stale
and new credentials are fetched. There is no minimum value, however it
should be less than the value set for
|
Lake formation security mapping#
SEP supports flexible security mapping for lake formation, which associates
SEP users or groups with AWS security entities like IAM roles according to a
JSON mappings file. The IAM role for a specific query can be selected from a
list of allowed roles using SHOW ROLE GRANTS FROM <catalog>
and SET ROLE <user-role> IN <catalog>
sql statements.
Each security mapping entry may specify one or more match criteria. If multiple criteria are specified, all criteria must match. The following match criteria are available:
"user":
- Regular expression to match against username. For example:alice|bob
to match either SEP users “alice” and “bob”."group":
- Regular expression to match against any of the groups that the user belongs to. For example:finance|sales
to match either the finance or sales groups in SEP.
Each SEP match criteria can be mapped to one or more of the following AWS security entities:
"iamRole":
IAM role to use if no user provided role is specified. This overrides any globally configured IAM role."roleSessionName":
(Optional) Only valid wheniamRole
is specified. IfroleSessionName
includes the string${USER}
, then the${USER}
portion of the string will be replaced with the current session’s username. IfroleSessionName
is not specified, it defaults totrino-session
."allowedIamRoles":
Comma-separated list of IAM roles that specified AWS account users are limited to.
SEP finds all the matching entries for a user in the security mapping
entries. When multiple entries match, SEP adds all the allowedIamRoles
values to a single comma-separated list.
For example, if user alice
belongs to both the hq
and admin
groups, the
user can access all the roles across the matching entries: alice_role
,
hq_role
, and admin_role
.
{
"mappings": [
{
"user": "alice",
"iamRole": "arn:aws:iam::123456789101:role/alice_role_1",
"allowedIamRoles": [
"arn:aws:iam::123456789101:role/alice_role",
]
},
{
"group": "hq",
"allowedIamRoles": [
"arn:aws:iam::123456789101:role/hq_role",
]
},
{
"group": "admin",
"allowedIamRoles": [
"arn:aws:iam::123456789101:role/admin_role",
]
}
]
}
You can set a default mapping by adding an entry to the end of the file that does not specify an SEP match criteria. If no mapping entry matches and no default is configured, access is denied with a “Cannot set role NONE” error.
The JSON mapping can either be retrieved from a file or REST-endpoint
specified via the lake-formation.security-mapping.config-file
config
property.
The following example JSON mapping applies SEP user and group mappings to security entities in AWS lake formation:
{
"mappings": [
{
"user": "bob|charlie",
"iamRole": "arn:aws:iam::123456789101:role/test_default",
"allowedIamRoles": [
"arn:aws:iam::123456789101:role/test_default"
"arn:aws:iam::123456789101:role/test1",
"arn:aws:iam::123456789101:role/test2",
"arn:aws:iam::123456789101:role/test3"
]
},
{
"user": "salesnorth",
"iamRole": "arn:aws:iam::123456789101:role/sales_north_users"
},
{
"group": "sales*",
"iamRole": "arn:aws:iam::123456789101:role/sales_all_users"
},
{
"iamRole": "arn:aws:iam::123456789101:role/default"
}
]
}
Security mapping configuration properties#
For AWS Lake Formation to work as intended, you must add the
s3.security-mapping.enabled=true
configuration property.
Property |
Description |
---|---|
|
Path and filename of the JSON mapping file, or REST-endpoint URI containing security mappings. |
|
How often to refresh the security mapping configuration. For example, use
|
The following example shows the lake formation security mapping configuration properties:
lake-formation.authorized-caller-tag=starburst-enterprise
lake-formation.security-mapping.config-file=etc/example-lake-formation-security-mapping.json
lake-formation.security-mapping.refresh-period=5m
s3.security-mapping.enabled=true
Security mapping role requirements#
AWS Lake Formation permissions are read from AWS using two different sets of assumed role credentials when executing queries against catalogs protected by AWS Lake Formation security policies:
admin
- Identified byhive.metastore.glue.iam-role
configuration property.user
- Selected according to Security Mapping rules.
You must configure the following in AWS IAM:
admin
role must be configured to assumeuser
role in AWS trust relationships. Read the AWS trust relationship documentation for more information.sts:TagSession
andsts:AssumeRole
actions must both be allowed.For read access:
glue:GetDatabases
,glue:GetDatabase
,glue:GetTables
,glue:GetTable
,glue:GetPartitions
,glue:GetPartition
,glue:BatchGetPartition
AWS Glue API actions must be granted to theuser
role.
For write access:
glue:GetDatabases
,glue:GetDatabase
,glue:GetTables
,glue:GetTable
,glue:GetPartitions
,glue:GetPartition
,glue:BatchGetPartition
,glue:CreateTable
,glue:DeleteTable
,glue:UpdateTable
,glue:BatchCreatePartition
,glue:UpdatePartition
,glue:DeletePartition
,lakeformation:GetDataAccess
AWS Glue and Lake Formation API actions must be granted to theuser
role.
Listing and selecting available user roles#
The following SQL statements are available to list and set roles:
SHOW ROLE GRANTS FROM <catalog_name>
- Lists AWS roles available to the user in a catalog protected by lake formation.SHOW CURRENT ROLES IN <catalog_name>
- Returns currently enabled roles.SET ROLE "arn:iam::..." IN <catalog_name>
- Selects a specific role.SET ROLE NONE IN <catalog_name>
- Causes the role to default to the role defined byiamRole
in the security mapping configuration, if its set.
Use the roles=<catalog_name>:arn:iam::...
connection property to select
specific role for a jdbc connection.
Caching#
In order to make permission checks run as fast as possible, lake formation access control caches permission data for each user. By default, permissions are stored for a maximum of 1000 users, and expire ofter 10 minutes. In the following example, the permissions cache size is reduced to 100 users, and the
permissions set to expire after two hours:
lake-formation.cache-ttl=2h
lake-formation.cache-size=100
Property |
Description |
---|---|
|
Time duration for which to store lake formation permission data for each user. |
|
Maximum number of users for which to store lake formation permission data. |
If needed, the cache can be manually cleared using a SQL procedure call in the catalog:
CALL system.flush_access_control_cache()
Views#
Views that were created before enabling AWS Lake Formation access control or views created in older SEP versions must be manually migrated before they can be queried using an AWS Lake Formation-secured catalog:
CREATE OR REPLACE VIEW AS query
If a view is using DEFINER
security mode, it must be dropped and created
again in INVOKER
security mode. Views with DEFINER
security mode are not
supported in AWS Lake Formation access control.
After migration, you can manage permissions for a view in AWS Lake Formation.
Limitations#
Migration of roles from Hive to Iceberg is not recommended.
Creating views in
DEFINER
security mode is not supported in AWS Lake Formation.If AWS Lake Formation controls access to a database and is configured to use data filters, another access control system should not be configured to control access to that same database. Overlapping policies may interact in unexpected ways.