Data classifier jobs automatically classify data in catalogs, schemas, tables, and views to apply attribute tags to that data.
When used in conjunction with attribute tags and policies, classification provides an automated way to perform governance on your data.
Data classifier jobs analyze the data and metadata of your catalogs, schemas, and tables. They also propose tags on columns. Administrators choose whether to accept or reject the tag proposal, and can change the color or name of the proposed tag.
A role in the user’s active role set must have the
account-level
privilege Manage Security
in order to create, update, view, or delete
classification jobs.
The classifier job queries data on a cluster in the account using a role the
user specifies that must be in the user’s active role
set.
Because queries execute on the cluster, the specified role must have the Use
Cluster
privilege on the cluster. The specified role must have at least one of
Create Tag
or Apply Tag
privileges to suggest proposed tags. Additionally,
only data for which the role has a SELECT
grant is analyzed.
To create a data classifier job, click Access control > Data classifier jobs in the navigation menu.
In the Create classification dialog, provide the following information:
In the Name and description section, enter a name for the job and a useful description.
In the Cluster section, choose a cluster to run the classifier on from the drop-down menu.
In the Execution role section, select an executing role.
In the Add catalogs, schemas and tables section, specify which catalogs, schemas, and tables to include in the classifier job.
In the Schedule section, use the toggle switches to enable Run on a schedule or Execute immediately.
For Select frequency: Choose an hourly, daily, weekly, monthly, or annual schedule from the drop-down menu. The corresponding values depend on the schedule:
hh:mm
, specify AM or PM, then
select a day of the week.hh:mm
, specify AM or PM, then
select a date.MM/DD
hh:mm
. Specify AM or PM.For Enter cron expression: Enter the desired schedule in the form of a UNIX cron expression. For example, a cycle scheduled to run weekly at 9:30 AM on Monday, Wednesday, and Friday:
30 9 * * 1,3,5
To run the classifier job now, use the toggle to enable Execute
immediately.
In the Create or edit classifier dialog, click addAdd a new classifier. To remove a classifier click (insert google icon).enter the following information:
Classifier type: Choose Regular expression or Text classification category.
0.8
in the threshold field, and 80% of a column’s rows match the
regular expression, the tag is suggested. The threshold must be a number
between 0 and 1.
All classifier jobs are listed in the Data classifier jobs pane.
The header displays the total number of classifier jobs, and provides a search bar for finding data classifier jobs.
The list of classifier jobs has the following columns:
The classifier job recommends tags as it comes across a table or column that could fit a requested category. Tags may be recommended while the job is still executing.
You can access the list of suggested tags in two ways:
Follow these steps to accept or reject the proposed tags:
In the Suggested tags dialog, select the checkbox next to the tags you would like to accept or reject. Alternatively, click the add icon next to a suggested tag name to open a drop-down menu where you can select additional, previously created tags to apply to the entity.
Click the corresponding button to apply or discard the selected tags. Clicking Apply selected tags on a proposed tag creates the tag if it does not already exist and applies the tag to the column attached to it. To remove the proposed tags from the suggested tags list, click Discard selected tags. Future classifier job runs that propose the same tag are not shown.
For more information on the classifier job, click the classifier job name.
The title of the summary pane is the name of the classifier job. The top portion provides a Run now button, the classifier job description, and the date of the next scheduled run.
The Run history section is organized in the following columns:
Perform editing tasks in the Data classifier jobs pane and the header section of classifier job’s summary pane.
To edit classifier jobs in the Data classifier jobs pane:
To edit classifier jobs in the classifier job’s summary pane:
To delete classifier jobs in the Data classifier jobs pane:
To delete classifiers in the classifier job’s summary pane:
Classifier Group | Data Category | Default Tag |
---|---|---|
PII |
E-Mail Address | pii.email |
Full Name | pii.full_name |
|
First Name | pii.first_name |
|
Last Name | pii.last_name |
|
Phone Number | pii.phone_number |
|
Street Address | pii.address |
|
Social Security Number (SSN) | pii.us_ssn |
|
Individual Taxpayer Identification Number (ITIN) | pii.us_itin |
|
Preparer Taxpayer Identification Number (PTIN) | pii.us_ptin |
|
Adoption Taxpayer Identification Number (ATIN) | pii.us_atin |
|
Passport Number | pii.passport |
|
International Mobile Equipment Identifier (IMEI) | pii.imei |
|
IP Address | pii.ip_address |
|
MAC Address | pii.mac_address |
|
URL | pii.url |
|
International Bank Account Number (IBAN) | pii.iban |
|
US Bank Account Number | pii.us_bank_num |
|
US Drivers License Number | pii.us_driver_num |
|
UK National Health Service Number (NHS) | pii.uk_nhs_num |
|
UK Drivers License Number | pii.uk_driver_num |
|
ABA Routing Number | pii.routing_number |
|
Employer Identification Number | pii.us_employer_id |
|
Canada Social Insurance Number | pii.ca_sin |
|
Australia Medicare Number | pii.au_medicare |
|
Australia Tax File Number | pii.aus_tax_file_number |
|
Language Code | pii.language_code |
|
Currency Code | pii.currency_code |
|
Medical Diagnostic Code | pii.diagnostic_code |
|
LOCATION |
Street Address | pii.address |
ZIP Code | pii.zip_code |
|
Postal Code | pii.postal_code |
|
Canadian Postal Code | pii.ca_postal_code |
|
US State Code | pii.us_state_code |
|
Canadian Province Code | pii.canada_province_code |
|
Country Code | pii.country_code |
|
Jurisdiction Code | pii.jurisdiction_code |
Is the information on this page helpful?
Yes
No