In Starburst Galaxy, you can use data classifier jobs to automatically classify data in catalogs, schemas, tables, and views to apply attribute tags to that data.
When used in conjunction with attribute tags and policies, classification provides an automated way to perform governance on your data.
Data classification jobs analyze the data and metadata of your catalogs, schemas, and tables and proposes tags on columns. Administrators choose whether to accept or reject the tag proposal, and can change the color or name of the proposed tag.
A role in the user’s active role set must have the
account-level
privilege Manage Security
in order to create, update, view, or delete
classification jobs.
The classifier job queries data on a cluster in the account using a role the
user specifies that must be in the user’s active role
set.
Because queries execute on the cluster, the specified role must have the Use
Cluster
privilege on the cluster. The specified role must have at least one of
Create Tag
or Apply Tag
privileges to suggest proposed tags. Additionally,
only data for which the role has a SELECT
grant is analyzed.
There are two methods to create data classifier jobs.
When viewing a catalog, schema, table or view from the catalog explorer, click Auto Tag, then Create a classifier. A dialog opens where you can configure the classifier job.
You can also create data classifier jobs by selecting Access control > Data classifier jobs from the navigation menu. From this pane, classifiers can be edited or deleted; suggested tag results from previous jobs can be accepted or rejected. Additionally, you can kick off new runs of previously created classifier jobs from this pane by selecting a classifier and clicking Run now.
A classification job has the following characteristics:
Attribute | Description |
---|---|
Name and description | The name and description given to the data classifier job. |
Cluster | The cluster on which the classifier job executes queries to sample the data. All catalogs, schemas, and tables to classify must be attached to the cluster. All queries the classifier job executes are recorded in query history. |
Execution role | The role that executes queries on the cluster. It must be a role in
the user's active role set. Classification occurs only on columns on
which the role's active role set has SELECT privileges. |
Catalogs, schemas, and tables | The catalogs, schemas, and tables to be classified. Multiple catalogs, schemas, or tables can be chosen. The more tables contained in the job, the longer the job will take. |
Classifiers | The groups of data to check for. At least one classifier must be selected. |
Schedule | Optionally, run classification jobs on a schedule. Choose a time zone from the drop-down menu, then select a frequency or enter a cron expression to set a schedule for the classification job to run. Select Execute immediately to run the classification job immediately. |
The classification job recommends tags as it comes across a table or column that could fit a requested category. Tags may be recommended while the job is still executing.
Follow these steps to accept or reject a proposed tag:
Alternatively, navigate to Access control > Data classifier jobs and click View results for a specific job in the list of all data classifier jobs. This opens the dialog of suggested tags.
Classifier Group | Data Category | Default Tag |
---|---|---|
PII |
E-Mail Address | pii.email |
Full Name | pii.full_name |
|
First Name | pii.first_name |
|
Last Name | pii.last_name |
|
Phone Number | pii.phone_number |
|
Street Address | pii.address |
|
Social Security Number (SSN) | pii.us_ssn |
|
Individual Taxpayer Identification Number (ITIN) | pii.us_itin |
|
Preparer Taxpayer Identification Number (PTIN) | pii.us_ptin |
|
Adoption Taxpayer Identification Number (ATIN) | pii.us_atin |
|
Passport Number | pii.passport |
|
International Mobile Equipment Identifier (IMEI) | pii.imei |
|
IP Address | pii.ip_address |
|
MAC Address | pii.mac_address |
|
URL | pii.url |
|
International Bank Account Number (IBAN) | pii.iban |
|
US Bank Account Number | pii.us_bank_num |
|
US Drivers License Number | pii.us_driver_num |
|
UK National Health Service Number (NHS) | pii.uk_nhs_num |
|
UK Drivers License Number | pii.uk_driver_num |
|
ABA Routing Number | pii.routing_number |
|
Employer Identification Number | pii.us_employer_id |
|
Canada Social Insurance Number | pii.ca_sin |
|
Australia Medicare Number | pii.au_medicare |
|
Australia Tax File Number | pii.aus_tax_file_number |
|
Language Code | pii.language_code |
|
Currency Code | pii.currency_code |
|
Medical Diagnostic Code | pii.diagnostic_code |
|
LOCATION |
Street Address | pii.address |
ZIP Code | pii.zip_code |
|
Canadian Postal Code | pii.ca_postal_code |
|
US State Code | pii.us_state_code |
|
Canadian Province Code | pii.canada_province_code |
|
Country Code | pii.country_code |
Is the information on this page helpful?
Yes
No