AWS Lake Formation access control support#

Note

AWS Lake Formation access control and security mapping support is a public preview feature. Contact Starburst support with questions or feedback.

Starburst Enterprise platform (SEP) provides support for using an existing AWS Lake Formation access control system.

Requirements#

In order to use AWS Lake Formation integration with Starburst Enterprise, you need:

  • An existing AWS Lake Formation configuration and AWS credentials that allow interacting with its API.

  • A valid Starburst Enterprise license.

Overview#

AWS Lake Formation provides a single place to manage access controls policies. You can define security policies that restrict access to data at database, table, column, row and cell levels. These policies apply to AWS Identity and Access Management (IAM) users and roles, and to users and groups when federating through an external identity provider.

Starburst Enterprise platform (SEP) integration with AWS Lake Formation enforces AWS Lake Formation access control policies when accessing registered Amazon S3 data lake locations.

AWS Lake Formation access control support is only available for catalogs, that use the Hive connector, since it utilizes the security system of the Hive connector.

Configure AWS Lake Formation#

Each catalog that needs to be controlled with AWS Lake Formation must have the catalog properties file configured to use the lake-formation Hive security:

hive.security=lake-formation

The following is a more complex example of a catalog properties file that is configured to use AWS Lake Formation for authorization with the Hive connector.

connector.name=hive
hive.security=lake-formation
hive.metastore=glue
hive.metastore.glue.region=us-east-2
hive.metastore.glue.default-warehouse-dir=s3://data-lake-bucket
hive.metastore.glue.iam-role=arn:aws:iam::<account_id>:role/role_for_glue
hive.s3.iam-role=arn:aws:iam::<account_id>:role/role_for_s3
lake-formation.authorized-caller-tag=starburst-enterprise
lake-formation.security-mapping.config-file=etc/lakeformation-security-mapping.json

More information on lake formation security mapping can be found later in this topic.

Configuration properties#

AWS Lake Formation configuration properties#

Property

Description

lake-formation.authorized-caller-tag

The value of LakeFormationAuthorizedCaller registered for SEP in third-party query engine integration.

Lake formation security mapping#

SEP supports flexible security mapping for lake formation, which associates SEP users or groups with AWS security entities like IAM roles according to a JSON mappings file. The IAM role for a specific query can be selected from a list of allowed roles using SHOW ROLE GRANTS FROM <catalog> and SET ROLE "..." IN <catalog> sql statements.

Each security mapping entry may specify one or more match criteria. If multiple criteria are specified, all criteria must match. The following match criteria are available:

  • "user": - Regular expression to match against username. For example: alice|bob to match either SEP users “alice” and “bob”.

  • "group": - Regular expression to match against any of the groups that the user belongs to. For example: finance|sales to match either the finance or sales groups in SEP.

Each SEP match criteria can be mapped to one or more of the following AWS security entities:

  • "iamRole": IAM role to use if no user provided role is specified. This overrides any globally configured IAM role.

  • "roleSessionName": (Optional) Only valid when iamRole is specified. If roleSessionName includes the string ${USER}, then the ${USER} portion of the string will be replaced with the current session’s username. If roleSessionName is not specified, it defaults to trino-session.

  • "allowedIamRoles": Comma-separated list of IAM roles that specified AWS account users are limited to.

The security mapping entries are processed in the order listed in the JSON mapping. More specific mapping entries should thus be specified before less specific mapping entries. For example, the mapping list might have a "group": entry for “salesnorth” followed by an entry for “sales” to allow to apply a more specific lake formation security mapping to the north sales team, before applying a more broad security mapping to the whole sales department.

You can set a default mapping by adding an entry to the end of the file that does not specify an SEP match criteria. If no mapping entry matches and no default is configured, access is denied with a “Cannot set role NONE” error.

The JSON mapping can either be retrieved from a file or REST-endpoint specified via the lake-formation.security-mapping.config-file config property.

The following example JSON mapping applies SEP user and group mappings to security entities in AWS lake formation:

{
  "mappings": [
    {
      "user": "bob|charlie",
      "iamRole": "arn:aws:iam::123456789101:role/test_default",
      "allowedIamRoles": [
        "arn:aws:iam::123456789101:role/test_default"
        "arn:aws:iam::123456789101:role/test1",
        "arn:aws:iam::123456789101:role/test2",
        "arn:aws:iam::123456789101:role/test3"
      ]
    },
    {
      "user": "salesnorth",
      "iamRole": "arn:aws:iam::123456789101:role/sales_north_users"
    },
    {
      "group": "sales*",
      "iamRole": "arn:aws:iam::123456789101:role/sales_all_users"
    },
    {
      "iamRole": "arn:aws:iam::123456789101:role/default"
    }
  ]
}

Security mapping configuration properties#

Security mapping configuration properties#

Property

Description

lake-formation.security-mapping.config-file

Path and filename of the JSON mapping file, or REST-endpoint URI containing security mappings.

lake-formation.security-mapping.refresh-period

How often to refresh the security mapping configuration. For example, use 5m to direct SEP to refresh security mappings every 5 minutes against the JSON mapping.

The following example shows the lake formation security mapping configuration properties:

lake-formation.authorized-caller-tag=starburst-enterprise
lake-formation.security-mapping.config-file=etc/example-lake-formation-security-mapping.json
lake-formation.security-mapping.refresh-period=5m

Security mapping role requirements#

AWS Lake Formation permissions are read from AWS using two different sets of impersonated role credentials when executing queries against catalogs protected by AWS Lake Formation security policies:

  • admin - Identified by hive.metastore.glue.iam-role configuration property.

  • user - Selected according to Security Mapping rules.

You must configure the following in AWS IAM:

  • admin role must be configured to impersonate user role in AWS Trust Relationships.

  • sts:TagSession and sts:AssumeRole actions must both be allowed.

  • glue:GetDatabases, glue:GetDatabase, glue:GetTables, glue:GetTable, glue:GetPartition, glue:BatchGetPartition AWS Glue API permissions must be granted for the user role.

Listing and selecting available user roles#

The following SQL statements are available to list and set roles:

  • SHOW ROLE GRANTS FROM <catalog_name> - Lists AWS roles available to the user in a catalog protected by lake formation.

  • SHOW CURRENT ROLES IN <catalog_name> - Returns currently enabled roles.

  • SET ROLE "arn:iam::..." IN <catalog_name> - Selects a specific role.

  • SET ROLE NONE IN <catalog_name> - Causes the role to default to the role defined by iamRole in the security mapping configuration, if its set.

Use the roles=<catalog_name>:arn:iam::... connection property to select specific role for a jdbc connection.

Caching#

In order to make permission checks run as fast as possible, lake formation access control caches permission data for each user. By default, permissions are stored for a maximum of 1000 users, and expire ofter 10 minutes. In the following example, the permissions cache size is reduced to 100 users, and the

permissions set to expire after two hours:

lake-formation.cache-ttl=2h
lake-formation.cache-size=100
Cache configuration properties#

Property

Description

lake-formation.cache-ttl

Time duration for which to store lake formation permission data for each user.

lake-formation.cache-size

Maximum number of users for which to store lake formation permission data.

If needed, the cache can be manually cleared using a SQL procedure call in the catalog:

CALL system.flush_access_control_cache()

Limitations#

The following are not supported:

  • DML or DDL operations in AWS Lake Formation.