Data maintenance#
Data maintenance jobs run tasks that improve performance and reduce storage in Apache Iceberg tables. Supported tasks include data file compaction, statistics collection, and deletion of outdated snapshots and unused files.
Note
Data maintenance is available as a public preview in Starburst Enterprise. Contact Starburst Support with questions or feedback.
The Data maintenance pane has the following levels:
The Top level shows a list of catalogs that include at least one maintenance task. Click the name of a catalog to see the schemas with maintenance tasks.
The Catalog level lists the specified schemas. Click the name of a schema to see the tables with maintenance tasks.
The Schema level lists the specified tables. Click the name of a table to see the tables with maintenance tasks.
The Table level shows the table’s defined maintenance tasks and further details. From this level, you can run, edit, or delete a task and can view the task’s run history.
Create data maintenance task#
From the SEP navigation menu, select Data > Data maintenance.
In the top, catalog, or schema details levels, click Create maintenance task, and provide the following information in the Configure data maintenance dialog:
In the Maintenance target section:
Specify a catalog and schema from the respective drop-down menus. If you opened this dialog from the catalog or schema details levels, those fields are pre-selected.
Select the All tables radio button to include all tables. This selection automatically applies maintenance tasks to all future Iceberg tables created as part of this schema. This selection also automatically stops maintenance jobs for tables which are removed. To include tables with separate schedules, see Edit data maintenance jobs.
Select the Select from all radio button to invoke the table drop-down menu. Expand the menu, and select one or multiple tables.
In the Maintenance tasks section, select at least one maintenance task:
Maintenance task | Description |
---|---|
Compaction | Improves performance by optimizing your data file size. |
Profiling and statistics | Improves performance by analyzing the table and collecting statistics about your data. |
Snapshot expiration | Reduces storage by deleting data snapshots. |
Delete orphan files | Reduces storage by deleting all files in the Iceberg storage location that are older than the set threshold period. This rule includes files that are not part of a table. |
In the Job schedule section:
Select a Time zone from the drop-down menu.
Choose the Select frequency or Enter cron expression recurring interval format.
For Select frequency: Choose an hourly, daily, weekly, monthly, or annual schedule from the drop-down menu. The corresponding values depend on the schedule:
Hourly: Enter a value between 0 and 59 minutes.
Daily: Enter a time in the format
hh:mm
, then specify AM or PM.Weekly: Enter a time in the format
hh:mm
, specify AM or PM, then select one or more days of the week.Monthly: Enter a time in the format
hh:mm
, specify AM or PM, then select a date.Annually: Enter a month, day, hour, and minutes in the format
MM/DD hh:mm
. Specify AM or PM.
For Enter cron expression: Enter the desired schedule in the form of a UNIX cron expression. For example, a cycle scheduled to run weekly at 9:30 AM on Monday, Wednesday, and Friday:
30 9 * * 1,3,5
Click Save.

Data maintenance job details#
All scheduled data maintenance jobs are listed in the Data maintenance pane beginning at the top level.
As with other panes in Starburst Enterprise platform (SEP), the top row of this pane provides catalog-schema-table breadcrumbs to show which details level you are on. Click the names in the breadcrumb list to navigate among the levels.
Header sections#
The header for the catalog and schema details levels include the symbol key, which explains the task symbols:
compress Compaction
search_insights Profile and statistics
deployed_code_history Snapshot expiration
vacuum Delete orphan files
The Search field at the top, catalog, and schema details levels let you restrict the list to matching values.
The Last run status drop-down menu at the schema details level lets you restrict the list to jobs that are scheduled, running, completed, or failed.
The Maintenance task drop-down menu at the catalog and schema details levels lets you restrict the list to a single task type.
Top level details#
The list of catalogs has the following columns:
Catalog: The name of the catalog with defined maintenance tasks.
Schemas with maintenance: The total number of schema-level jobs.
Tables with maintenance: The total number of table-level jobs.

Catalog level details#
To view catalog level details, click the name of a catalog from the top details level.
Catalog level details are organized in the following columns:
Location: The specified schema.
Tasks: The symbols representing the tasks included in the data maintenance job.
Last run: The date and time the data maintenance job was last run.
Next execution: The next scheduled run time.

Schema level details#
To view schema level details, click the name of a schema from the catalog details level. The schema level details list can include individual tables or maintenance tasks set up to run for all tables in a schema.
Schema level details are organized in the following columns:
Location: The tables with defined maintenance tasks.
Status: An icon showing the status of the data maintenance job:
check_circle Completed
error Failed
Tasks: The symbols representing the tasks included in the data maintenance job.
Last run: The date and time the data maintenance job was last run.
Next execution: The next scheduled run time.
Themore_vertOptions menu.
The Status and Tasks columns are empty until a job’s first run.
Use the options menu to edit the job or to run it now.

Table level details#
For more information on individual data maintenance jobs, click a table name from the schema details level.
The title of the table level details pane is the name of the table. The top portion of the pane provides a summary of the selected data maintenance job, a Run now button, and an options menu that allows you to edit the job.
The Task history section is organized in the following columns:
Query ID: The unique identifier for the statement. Click the Query ID to view Query details.
Status: The status of the data maintenance job:
check_circle Completed
error Failed; the Debug link opens a dialog with information about the failed task.
Started: When the data maintenance job started.
Elapsed time: The duration of data maintenance job.

Manage data maintenance jobs#
All editing is performed at the schema and table details levels. The schema details level allows for bulk and individual edits, while the table details level allows for individual task edits only.
Edit data maintenance jobs#
To make bulk edits, go to the schema details level, and follow these steps:
Click themore_vertoptions menu in the header.
Click Bulk edit jobs.
Click the top checkbox to select all tables or select individual tables by clicking their checkboxes.
Select Edit selected to invoke the Edit data maintenance dialog.
Make changes and resolve any discrepancies in tasks and schedules in the selected set of tables.
Click Save.
To make individual edits:
At the schema details level:
Click themore_vertoptions menu in the row of the table of interest.
Select Edit job to invoke the Edit data maintenance dialog.
Make changes, then click Save.
At the table details level:
Click themore_vertoptions menu in the header.
Select Edit job to invoke the Edit data maintenance dialog.
Make changes, then click Save.
Existing data maintenance jobs exclude tables with separate maintenance schedules. To include these tables, delete the data maintenance job associated with those tables. The once excluded tables are now automatically included in the data maintenance job.
Delete data maintenance jobs#
To delete jobs in bulk, go to the schema details level, and follow these steps:
Click themore_vertoptions menu in the header.
Click Bulk delete jobs.
Click the top checkbox to select all tables or select individual tables by clicking their checkboxes.
Select Delete selected.
In the Confirm delete dialog, Yes, delete.
To make individual deletes:
At the schema details level:
Click themore_vertoptions menu in the row of the table of interest.
Select Delete job.
In the Confirm delete dialog, Yes, delete.
At the table details level:
Click themore_vertoptions menu in the header.
Select Delete job.
In the Confirm delete dialog, Yes, delete.