AWS Lake Formation access control support#

Note

AWS Lake Formation access control support is a public preview feature. Contact Starburst support with questions or feedback.

Starburst Enterprise platform (SEP) provides support for using an existing AWS Lake Formation access control system.

Requirements#

In order to use AWS Lake Formation integration with Starburst Enterprise, you need:

  • An existing AWS Lake Formation configuration and AWS credentials that allow interacting with it’s API.

  • A valid Starburst Enterprise license.

Overview#

AWS Lake Formation provides a single place to manage access controls policies. You can define security policies that restrict access to data at database, table, column, row and cell levels. These policies apply to AWS Identity and Access Management (IAM) users and roles, and to users and groups when federating through an external identity provider.

Starburst Enterprise platform (SEP) integration with AWS Lake Formation enforces AWS Lake Formation access control policies when accessing registered Amazon S3 data lake locations.

AWS Lake Formation access control support is only available for catalogs, that use the Hive connector, since it utilizes the security system of the Hive connector.

Configure AWS Lake Formation#

Each catalog that needs to be controlled with AWS Lake Formation must have the catalog properties file configured to use the lake-formation Hive security:

hive.security=lake-formation

The following is a more complex example of a catalog properties file that is configured to use AWS Lake Formation for authorization with the Hive connector.

connector.name=hive
hive.security=lake-formation
hive.metastore=glue
hive.metastore.glue.region=us-east-2
hive.metastore.glue.default-warehouse-dir=s3://data-lake-bucket
hive.metastore.glue.iam-role=arn:aws:iam::<account_id>:role/role_for_glue
hive.s3.iam-role=arn:aws:iam::<account_id>:role/role_for_s3
lake-formation.authorized-caller-tag=starburst-enterprise
lake-formation.security-mapping.config-file=etc/lakeformation-security-mapping.json
lake-formation.security-mapping.iam-role-credential-name=Example_Role_Credential_Name

More information on lake formation security mapping can be found later in this topic.

Configuration properties#

AWS Lake Formation configuration properties#

Property

Description

lake-formation.role-credential

The name of the extra credential used to provide role ARN which is used when communicating with AWS Lake Formation. For example, given lake-formation.role-credential=aws_lf_role add extraCredentials=aws_lf_role:arn:aws:iam::<account_id>:role/lf_role to the parameters used with the JDBC driver to connect to SEP. Users of the CLI can use the --extraCredential option.

lake-formation.authorized-caller-tag

The value of LakeFormationAuthorizedCaller registered for SEP in third-party query engine integration.

Lake formation security mapping#

Note

This feature is in limited preview. It is subject to change and not supported for production environments.

SEP supports flexible security mapping for lake formation, which associates SEP users or groups with AWS security entities like IAM roles according to a JSON mappings file. The IAM role for a specific query can be selected from a list of allowed roles by providing it as an extra credential.

Each security mapping entry may specify one or more match criteria. If multiple criteria are specified, all criteria must match. The following match criteria are available:

  • "user": - Regular expression to match against username. For example: alice|bob to match either SEP users “alice” and “bob”.

  • "group": - Regular expression to match against any of the groups that the user belongs to. For example: finance|sales to match either the finance or sales groups in SEP.

Each SEP match criteria can be mapped to one or more of the following AWS security entities:

  • "iamRole": IAM role to use if no user provided role is specified as an extra credential. This overrides any globally configured IAM role. This role is allowed to be specified as an extra credential, although specifying it explicitly has no effect, as it would be used anyway.

  • "roleSessionName": (Optional) Only valid when iamRole is specified. If roleSessionName includes the string ${USER}, then the ${USER} portion of the string will be replaced with the current session’s username. If roleSessionName is not specified, it defaults to trino-session.

  • "allowedIamRoles": Comma-separated list of IAM roles that are allowed to be specified as an extra credential. This is useful because a particular AWS account may have permissions to use many roles, but a specific user should only be allowed to use a subset of those roles.

The security mapping entries are processed in the order listed in the JSON mapping. More specific mapping entries should thus be specified before less specific mapping entries. For example, the mapping list might have a "group": entry for “salesnorth” followed by an entry for “sales” to allow to apply a more specific lake formation security mapping to the north sales team, before applying a more broad security mapping to the whole sales department.

You can set a default mapping by adding an entry to the end of the file that does not specify an SEP match criteria. If no mapping entry matches and no default is configured, access is denied.

The JSON mapping can either be retrieved from a file or REST-endpoint specified via the lake-formation.security-mapping.config-file config property.

The following example JSON mapping applies SEP user and group mappings to security entities in AWS lake formation:

{
  "mappings": [
    {
      "user": "bob|charlie",
      "iamRole": "arn:aws:iam::123456789101:role/test_default",
      "allowedIamRoles": [
        "arn:aws:iam::123456789101:role/test1",
        "arn:aws:iam::123456789101:role/test2",
        "arn:aws:iam::123456789101:role/test3"
      ]
    },
    {
      "user": "salesnorth",
      "iamRole": "arn:aws:iam::123456789101:role/sales_north_users"
    },
    {
      "group": "sales*",
      "iamRole": "arn:aws:iam::123456789101:role/sales_all_users"
    },
    {
      "iamRole": "arn:aws:iam::123456789101:role/default"
    }
  ]
}

Security mapping configuration properties#

Security mapping configuration properties#

Property

Description

lake-formation.security-mapping.config-file

Path and filename of the JSON mapping file, or REST-endpoint URI containing security mappings.

lake-formation.security-mapping.iam-role-credential-name

The name of the extra credential used to provide the IAM role.

lake-formation.security-mapping.refresh-period

How often to refresh the security mapping configuration. For example, use 5min to direct SEP to refresh security mappings every 5 minutes against the JSON mapping.

lake-formation.security-mapping.colon-replacement

This property is for use in shells that do not handle the colon properly. It is otherwise not required. When defined, the character or characters are used in place of the colon (:) character in IAM role names used as an extra credential. Do not use quotes to enclose the replacement string. Any instances of this replacement string in the extra credential value are converted to a colon. Choose a value that is not used in any of your IAM ARNs, for example %colon%.

The following example shows the lake formation security mapping configuration properties:

lake-formation.role-credential=aws_role
lake-formation.authorized-caller-tag=starburst-enterprise
lake-formation.security-mapping.config-file=etc/example-lake-formation-security-mapping.json
lake-formation.security-mapping.iam-role-credential-name=Example_Role_Credential_Name
lake-formation.security-mapping.refresh-period=5m
lake-formation.security-mapping.colon-replacement=%colon%

Security mapping role requirements#

AWS Lake Formation permissions are read from AWS using two different sets of impersonated role credentials when executing queries against catalogs protected by AWS Lake Formation security policies:

  • admin - Identified by hive.metastore.glue.iam-role configuration property.

  • user - Selected according to Security Mapping rules.

You must configure the following in AWS IAM:

  • admin role must be configured to impersonate user role in AWS Trust Relationships.

  • sts:TagSession and sts:AssumeRole actions must both be allowed.

  • glue:GetDatabases, glue:GetTables, glue:GetTable AWS Glue API permissions must be granted for the user role.

Caching#

In order to make permission checks run as fast as possible, lake formation access control caches permission data for each user. By default, permissions are stored for a maximum of 1000 users, and expire ofter 10 minutes. In the following example, the permissions cache size is reduced to 100 users, and the

permissions set to expire after two hours:

lake-formation.cache-ttl=2h
lake-formation.cache-size=100
Cache configuration properties#

Property

Description

lake-formation.cache-ttl

Time duration for which to store lake formation permission data for each user.

lake-formation.cache-size

Maximum number of users for which to store lake formation permission data.

If needed, the cache can be manually cleared using a SQL procedure call in the catalog:

CALL system.flush_access_control_cache()

Limitations#

The following are not supported:

  • Cell filters and column masking defined in AWS Lake Formation.

  • DML or DDL operations in AWS Lake Formation.