Starburst Galaxy

  •  Get started

  •  Working with data

  •  Data engineering

  •  Developer tools

  •  Cluster administration

  •  Security and compliance

  •  Troubleshooting

  • Galaxy status

  •  Reference

  • Access control policy types #

    A policy supplies privileges to an entity such as a catalog, schema, table, view, or column, based on the entity’s assigned attributes, such as tags.

    A policy has the following characteristics:

    Attribute Description
    Name and description The name and description given to the policy upon creation.
    Matching expression A boolean expression that can test whether the policy should apply. Supported expressions are listed below.
    A role that the policy is assigned to The policy is only active if the policy role is contained in the user's current active role set.
    Grants of privileges on entities A set of privileges that can either be granted or denied. You can identify all tables in a catalog by selecting All schemas and All tables in the Scope section of the Policies tab. The entities on which policies can grant privileges are catalogs, schemas, tables, views, and columns.
    Combined privilege checks with role-based privileges When checking privileges, role-based privileges and attribute-based privileges from policies are combined using the same rules as are used for multiple role-based privileges:

    • ALLOW privileges are additive, so the allowed privileges on the entity are the union of role-based and attribute-based ALLOW privileges from either source.
    • However, a DENY privilege from either role-based or attribute-based privileges overrides an ALLOW privilege from either source.

    Role-based access control #

    A role has a name and an optional description. A role can be granted privileges on entities such as clusters, catalogs, and tables. This provides fine-grained control that protects your data, and allows you to define just the right mix of allowed actions and access for each function in your organization.

    Manage users, roles, and privileges in the Access - Users section and the Access - Roles and privileges section of Starburst Galaxy.

    All actions are controlled by privileges or ownership.

    Attribute-based access control #

    The attribute-based access control system of Starburst Galaxy allows the combination of policies and attributes, such as tags, to further manage role access to entities such as catalogs, schemas, tables, views, and columns.

    A policy can grant privileges, apply row filters, or apply column masks to different entities, such as a schema and a table. A policy is only active if the policy role is contained in the user’s current active role set.

    Starburst Galaxy combines attribute-based privilege grants and role-based privilege grants to determine role access to entities.

    For more information, see Access control basics.

    Matching expressions #

    The matching expression is a boolean expression that determines whether the policy is applicable to the particular entity. Matching expressions are defined using a custom language. All supported expressions are documented on this page.

    Parentheses are also supported and can be used to make complex expressions. The following example shows a matching expression that returns true if either the set of tags on an entity includes the tag sales_department or includes tag marketing_department as well as tag sales_liaison:

    HAS_TAG(sales_department) OR (HAS_TAG(marketing_department) AND HAS_TAG(sales_liaison))
    

    The full list of supported matching expressions is shown below:

    Supported base expressions #

    Base expressions are simple expressions that look exclusively at some piece of data and do not rely on any other expressions for their evaluation. Supported expressions can evaluate tags, user attributes, or the names of entities.

    Here is the full list of supported base expressions:

    Expression Description
    true or false Booleans are acceptable expressions.
    has_tag($TAG_NAME) Test whether $TAG_NAME is associated with an entity. Tags can be associated with one or more catalogs, schemas, tables, views, or columns.
    has_tag($TAG_NAME.*) Test whether parent tag $TAG_NAME (or any tag that starts with $TAG_NAME.) is associated with an entity. $TAG_NAME in this case is considered a parent tag (example pii). Child tag names have the tag name as the prefix and are separated by a period (example pii.email).
    user_attribute_exists('$ATTR') Test whether the user has an attribute that exists entitled $ATTR. Only enabled with SSO integrations.
    See more detail here.
    user_has_attribute('$ATTR', '$VAL') Test whether the user has an attribute that exists entitled $ATTR and at least one corresponding value equals $VAL. Only enabled with SSO integrations.
    See more detail here.
    catalog_name_matches('$NAME*') Private preview
    Test whether the catalog name matches the given pattern. The scope of the policy must be one or more catalogs, or ALL_CATALOGS.
    See more detail here.
    schema_name_matches('$NAME*') Private preview
    Test whether the schema name matches the given pattern. The scope of the policy must be one or more catalogs or schemas, or ALL_CATALOGS.
    See more detail here.
    table_name_matches('$NAME*') Private preview
    Test whether the table name matches the given pattern. The scope of the policy at or above the level of a table or ALL_CATALOGS.
    See more detail here.

    Compound expressions #

    Compound expressions take in one or more of the above base expressions to create more expressive access control policies. All compound expressions are listed below; any usage of $EXPRESSION can be replaced by any of the base expressions listed above.

    Expression Description Example
    NOT $EXPRESSION Evaluates to true if $EXPRESSION evaluates to false. NOT has a higher precedence than AND and OR. NOT has_tag(pii)
    $EXPRESSION AND $EXPRESSION Evaluates to true if both expressions evaluates to true, false otherwise. AND has a higher precedence than OR. has_tag(pii) AND table_name_matches('foo*')
    $EXPRESSION OR $EXPRESSION Evaluates to true if either expression evaluates to true, false if both are false. has_tag(pii) OR table_name_matches('foo*')

    For compound expressions, it may be useful to use parentheses for clarity and to avoid ambiguity around operator precedence. For example, the following two expressions are equivalent because AND has a higher precedence than OR:

    HAS_TAG(pii.email) OR HAS_TAG(pii.phone) AND HAS_TAG(pii.address)
    HAS_TAG(pii.email) OR (HAS_TAG(pii.phone) AND HAS_TAG(pii.address))
    

    Policy grants #

    A policy can define grants on multiple entities, and a single policy can grant privileges on different entity kinds. For example, a policy named sales_admin might allow the CREATE_TABLE privilege on all schemas in a catalog named sales_data and allow privileges SELECT, UPDATE and INSERT on all tables in catalog sales_data.

    Defining policies #

    The Policies tab lists all policies. The tab is only visible if a role in the user’s active role set has the account-level privilege Manage security.

    If a role in the user’s active role set has the Manage security privilege, the user can create a new policy by clicking Add policy.

    When creating a policy, you can enter the matching expression as free text in the matching expression input field. For each character typed, the expression parser is run, and the input field turns pink if there is a syntax error in the matching expression, or if it references a tag that does not exist.

    You can change any attribute of a policy listed in the policies tab by clicking edit

    Policy grants and privilege checking #

    The Starburst Galaxy access control system performs the following operations to check whether the user has access to a catalog, schema, table, view, or column entity:

    • The tags associated with the catalog, schema, table, view, or column are fetched. Because of tag inheritance, a tag applied to a catalog is returned in the set of tags for any schema, table, view, or column in the catalog. Similarly, a tag applied to a schema is returned in the set of tags for any table, view, or column in the schema. Finally, a tag applied to a table or view is returned in the set of tags for any column in the table or view.
    • The system evaluates the matching expression for all policies based on the tags associated with the entity. The system collects the privileges granted by the policies on the entity for which the matching expressions are true.
    • Role-based grants for the entity are retrieved and combined with the policy grants. The usual rules for ALLOW and DENY privilege grants are applied, so a DENY privilege always overrides an ALLOW privilege whether the privileges come from role-based grants or from attribute-based grants.

    Policy grants and catalog, schema, table, and view visibility #

    A key operation performed by the access control system determines visibility of catalogs, schemas, tables, and views:

    • A catalog is visible to a user if a role in the user’s active role set owns the catalog, or has an ALLOW privilege on the catalog or any schema, table, view, or column in the catalog, and that the ALLOW privilege is not overridden by a DENY privilege. The privileges can come from role-based grants, or from policies whose matching expression is true for the tags associated with a tagged entity in the catalog.
    • Similarly, a schema is visible to a user if a role in the user’s active role set owns the schema, or has an ALLOW privilege on the schema or any table, view, or column in the schema, and that the ALLOW privilege is not overridden by a DENY privilege. The privileges can come from role-based grants, or from policies whose matching expression is true for the tags associated with a tagged entity in the schema.
    • Finally, a table or view is visible to a user if a role in the user’s active role set owns the table or view, or has an ALLOW privilege on the table or view, or any column in the table or view, and that the ALLOW privilege is not overridden by a DENY privilege. The privileges can come from role-based grants, or from policies whose matching expression is true for the tags associated with a tagged entity in the table or view.

    Row-level filters #

    Row-level filters are an important component of data governance. They ensure that the rows returned by queries are rows the user is entitled to see. Unlike RBAC grants, which apply to entire tables or columns, row filters determine access on a row-by-row basis.

    A row-level filter is a named SQL expression. One or more row filters can be added to a policy whose target is a table or view. When Starburst Galaxy fetches privileges for a table or view, if the table or view matches the policy target, and the policy matching expression evaluates to true for the table’s tags, the policy’s row filters are included in the table privileges. Multiple different policies can match the table, so the row filters returned cam come from multiple different policies.

    The row filter SQL expressions from all row filters from all matching policies are compared with the OR operator and added to the query’s WHERE clause. The rows in the query for which any of the row filter expressions returns true are included in the query result, and the rows for which all row filters returned false are excluded.

    Row filters are evaluated using the privileges of the role that owns the row filter, which may not be the same as the current role of the user running a query to which the row filter is applied. This means that the row filter in general has privileges to reference tables the user running a query does not have the privileges to access.

    Creating row-level filters #

    To create row filters, a user’s active role set must have the Manage security privilege. In addition, the Row filters side pane link is only visible to users whose active role set has the Manage security privilege.

    See Row filters for more details.

    Column masks #

    Column masks are another access control tool that can be used to protect sensitive data. They ensure that column data is only visible to those who have permission to see the data by masking the values returned by the query.

    One or more column masks can be added to a policy whose target is a table or view. Column masks function similar to row filters in that when Starburst Galaxy fetches privileges for a table or view, and that table or view matches the policy target and the policy’s matching expression is true for the table or view’s tags, then the policy’s column masks are included in the table or view privileges.

    Column masks are evaluated using the privileges of the role that owns the column mask, which may not be the same as the current role of the user running a query to which the column mask is applied. This means that the column mask in general has privileges to reference tables the user running a query does not have the privileges to access.

    When you execute a query, Galaxy automatically rewrites your query and applies a column mask expression to the specified column. The column rewrite applies the mask expression everywhere the column appears in the query. Users see masked data based on the conditions you define.

    When to use masking vs hashing #

    Masking and hashing are different obfuscation techniques that have different use cases:

    • Masking: Use masking to protect privacy or to obscure sensitive data to meet compliance requirements. Masking lets you hide specific portions of a value, such as displaying only the last four digits of a social security number. Do not perform joins on masked columns, as this produces an unexpected output.

    • Hashing: Use hashing when you need to obfuscate data irreversibly. Hashing lets you anonymize data while ensuring that the same input always produces the same hashed output.

    Creating column masks #

    To create column masks, a user’s active role set must have the Manage security privilege. In addition, the Column mask side pane link is only visible to users whose active role set has the Manage security privilege.

    See Column masks for more details.

    User attribute predicates #

    User attributes can be imported into Galaxy from your identity provider, such as Okta. A total of 7KB in attribute statements is supported. Statements exceeding 7KB may result in attributes being dropped.

    For more information on importing attribute statements from Okta, see Okta SAML setup.

    User attributes in policy expressions #

    User attribute statements as sent from Okta or any other SSO provider can be evaluated as part of a policy expression.

    These expression are supported as part of the policy language:

    • user_attribute_exists('attributeName') evaluates to true when attributeName is sent from your IdP and at least one value corresponding to attributeName is not NULL.
    • user_has_attribute('attributeName', 'attributeValue') evaluates to true when attributeName is sent from your IdP and at least one value corresponding to attributeName equals attributeValue.

    Note that attributeName and attributeValue are placeholders and can be replaced with any values you specify in your IdP. Additionally, strings can be escaped using a backslash \. For example, an expression such as user_attribute_exists('it\'s an example') is valid and matches with the SSO attribute name it's an example.

    User attributes in row filter expressions #

    User attribute statements can also be substituted in row filter expressions automatically on a per user basis. There are two special expressions that allow for this substitution:

    • $USER_ATTRIBUTE('attributeName') replaces the first matching attribute for the name attributeName as sent from your SSO provider. If no values for attributeName are found, NULL is returned.
    • $USER_ATTRIBUTE_LIST('attributeName') replaces the matching attributes for attributeName with a list in the form of (val1, val2, ...)

    The standard convention is to use $USER_ATTRIBUTE('attributeName') when checking for strict equality in the filter expression, and $USER_ATTRIBUTE_LIST('attributeName') when you want to match any value in the list. Note that there is no validation performed on the row filter expression.

    Wildcard name predicates #

    Policy expressions can also support wildcard resource name wildcard matching on catalog, schema, and table names. The name to match is entered as an argument to the matching function. The character * represents the wildcard matching character.

    Predicate supported include:

    • catalog_name_matches('foo*') evaluates to true if the name of the catalog starts with or equals foo.
    • schema_name_matches('*foo') evaluates to true if the name of the schema ends with or equals foo.
    • table_name_matches('foo*') evaluates to true if the name of the table starts with or equals foo.

    Limitations of wildcard name matcher predicates #

    There is a maximum of one wildcard * marker allowed in a wildcard matcher expression.

    When considering catalog or schema visibility, if there is a policy that grants an ALLOW on a privilege and contains a wildcard expression in scope whose predicate operates below the level being checked for visibility, the container entity is considered visible unless there is a corresponding DENY grant on that privilege.

    For example, consider a policy on all catalogs with the predicate table_name_matches('foo*') which grants ALLOW SELECT access to all tables that match. If a role were to query SHOW CATALOGS, Galaxy does not look for tables in those catalogs that match foo*. Galaxy treats all catalogs as visible. Similarly, the statement SHOW SCHEMAS FROM mycatalog shows all schemas in the catalog even if there are no tables in schemas that match foo*.

    The above example only applies to policies that grant ALLOW on a privilege. DENY privilege grants, row filters, or column masks do not affect visibility in this case.