Starburst Galaxy

  •  Get started

  •  Working with data

  •  Data engineering

  •  Developer tools

  •  Cluster administration

  •  Security and compliance

  •  Troubleshooting

  • Galaxy status

  •  Reference

  • Individual data maintenance #

    This page describes how to configure data maintenance jobs for one table at a time.

    To perform data maintenance on multiple tables at once, see data maintenance for Iceberg tables.

    To perform data maintenance on live tables, see data maintenance for Kafka streaming ingestion and data maintenance for file ingestion.

    Create individual DM job #

    To create a data maintenance job for an individual table, follow these steps:

    • Open the catalog that contains the table you want to schedule tasks for.
    • Go to the table level.
    • Click the Quality tab, then click the event_repeat maintenance schedule icon.

    • Provide the following information in the dialog:
      • In the Execution details section, select an executing role and a cluster from the Select execution role and Select cluster drop-down menus.

    • In the Maintenance tasks section, select at least one maintenance task. Not all table types support all maintenance tasks:

    Maintenance task Description Iceberg Delta Lake Hive Hudi
    Compaction Improves performance by optimizing your data file size. check_circle check_circle
    Profiling and statistics Improves performance by analyzing the table and collecting statistics about your data. check_circle check_circle check_circle check_circle
    Snapshot expiration Reduces storage by deleting data snapshots according to the number of days you specify. check_circle
    Delete orphan files Reduces storage by deleting all files in the storage location that are older than the set threshold period. This rule includes files that are not part of a table. check_circle check_circle
    • In the Execution details section, select one or multiple executing roles and a cluster from the Select cluster the respective drop-down menus.

    • In the Job schedule section:

      • Select a Time zone from the drop-down menu.
      • Choose the Select frequency or Enter cron expression recurring interval format.

      For Select frequency: Choose an hourly, daily, weekly, monthly, or annual schedule from the drop-down menu. The corresponding values depend on the schedule:

      • Hourly: Enter a value between 1 minute and 59 minutes.
      • Daily: Enter a time in the format hh:mm, then specify AM or PM.
      • Weekly: Enter a time in the format hh:mm, specify AM or PM, then select a day of the week.
      • Monthly: Enter a time in the format hh:mm, specify AM or PM, then select a date.
      • Annually: Enter a month, day, hour, and minutes in the format MM/DD hh:mm. Specify AM or PM.

      For Enter cron expression: Enter the desired schedule in the form of a UNIX cron expression. For example, a cycle scheduled to run weekly at 9:30 AM on Monday, Wednesday, and Friday:

    30 9 * * 1,3,5
    
    • Click Save.

    For Profiling and statistics results, see the Data quality metrics section of the Quality tab.

    All data maintenance jobs appear on the top details level in Data > Data maintenance. To see the status of your data maintenance job, go to the schema details level.

    Manage individual DM jobs #

    You can manage individual data maintenance jobs in the Quality tab or on the schema and table detail levels in data maintenance.

    Edit individual DM jobs #

    • In the Quality tab:
      • Click the event_repeat maintenance schedule icon.
      • Make changes, and click Save.

    • In Data maintenance:
      • At the schema details level:
        • Click themore_vertoptions menu in the row, then select Edit job.
        • Make changes, then click Save.
      • At the table details level:
        • Click themore_vertoptions menu in the header, then select Edit job.
        • Make changes, then click Save.

    Delete individual DM jobs #

    • In the Quality tab:
      • Click the event_repeat maintenance schedule icon.
      • Click Delete.

    • In Data maintenance:
      • At the schema details level:
        • Click themore_vertoptions menu in the row of the table of interest.
        • Select Delete job.
        • In the dialog, Yes, delete.
      • At the table details level:
        • Click themore_vertoptions menu in the header.
        • Select Delete job.
        • In the dialog, Yes, delete.