This page describes how to configure data maintenance jobs for one table at a time.
To perform data maintenance on multiple tables at once, see data maintenance for Iceberg tables.
To perform data maintenance on live tables, see data maintenance for Kafka streaming ingestion and data maintenance for file ingestion.
To create a data maintenance job for an individual table, follow these steps:
Click the Quality tab, then click the event_repeat maintenance schedule icon.
Maintenance task | Description | Iceberg | Delta Lake | Hive | Hudi |
---|---|---|---|---|---|
Compaction | Improves performance by optimizing your data file size. | check_circle | check_circle | ||
Profiling and statistics | Improves performance by analyzing the table and collecting statistics about your data. | check_circle | check_circle | check_circle | check_circle |
Snapshot expiration | Reduces storage by deleting data snapshots according to the number of days you specify. | check_circle | |||
Delete orphan files | Reduces storage by deleting all files in the storage location that are older than the set threshold period. This rule includes files that are not part of a table. | check_circle | check_circle |
In the Execution details section, select one or multiple executing roles and a cluster from the Select cluster the respective drop-down menus.
In the Job schedule section:
For Select frequency: Choose an hourly, daily, weekly, monthly, or annual schedule from the drop-down menu. The corresponding values depend on the schedule:
hh:mm
, then specify AM or PM.hh:mm
, specify AM or PM, then
select a day of the week.hh:mm
, specify AM or PM, then
select a date.MM/DD
hh:mm
. Specify AM or PM.For Enter cron expression: Enter the desired schedule in the form of a UNIX cron expression. For example, a cycle scheduled to run weekly at 9:30 AM on Monday, Wednesday, and Friday:
30 9 * * 1,3,5
For Profiling and statistics results, see the Data quality metrics section of the Quality tab.
All data maintenance jobs appear on the top details level in Data > Data maintenance. To see the status of your data maintenance job, go to the schema details level.
You can manage individual data maintenance jobs in the Quality tab or on the schema and table detail levels in data maintenance.
Is the information on this page helpful?
Yes
No