OpenLineage event listener#

The OpenLineage event listener plugin allows streaming of lineage information, encoded in JSON format aligned with OpenLineage specification, to an external, OpenLineage compatible API, by POSTing them to a specified URI.

Rationale#

This event listener aims to capture every query that creates or modifies Trino tables or columns and transforms it into lineage information. Lineage can be understood as the relationship or flow between data in tables and columns. OpenLineage is a widely used open-source standard for capturing lineage information from a variety of systems including Spark, Airflow, and Flink.

Trino Query attributes mapping to OpenLineage attributes#

Trino

OpenLineage

{UUID(Query Id)}

Run ID

{queryCreatedEvent.getCreateTime()} or {queryCompletedEvent.getEndTime()}

Run Event Time

Query Id

Job Facet Name

trino:// + {openlineage-event-listener.trino.uri.getHost()} + ":" + {openlineage-event-listener.trino.uri.getPort()}

Job Facet Namespace (default, can be overridden)

{schema}.{table}

Dataset Name

trino:// + {openlineage-event-listener.trino.uri.getHost()} + ":" + {openlineage-event-listener.trino.uri.getPort()}

Dataset Namespace

Available Trino Facets#

Trino Metadata#

Facet containing properties (if present):

  • query_plan - The execution plan for the query.

  • transaction_id - The transaction id used for query processing.

  • query_id - The unique identifier assigned to each query.

related to query based on which OpenLineage Run Event was generated.

Available in both Start and Complete/Fail OpenLineage events.

If you want to disable this facet, add trino_metadata to openlineage-event-listener.disabled-facets.

Trino Query Context#

Facet containing properties:

  • server_version - version of Trino server that was used to process the query

  • environment - inherited from node.environment of Node properties

  • query_type - one of query types configured via openlineage-event-listener.trino.include-query-types

related to query based on which OpenLineage Run Event was generated.

Available in both Start and Complete/Fail OpenLineage events.

Additionally, the following optional properties are available to help organize and structure the data sent in lineage events:

Trino Query Context properties#

Property

Description

user

The identifier of the user that ran the query.

original_user

The original identifier of the user that ran the query. It can differ from user in situations where impersonation is involved.

principal

The authenticated entity from an external security system.

source

The Name of the client used as the source that submits the query.

client_info

Additional information about the client making the query.

remote_client_address

The IP address of the remote client from which the request is received.

user_agent

The value of the User-Agent header.

trace_token

The token used for query tracing purposes.

If you want to disable this facet, add trino_query_context to openlineage-event-listener.disabled-facets.

Trino Query Statistics#

Facet containing full contents of query statistics of completed. Available only in OpenLineage Complete/Fail events.

If you want to disable this facet, add trino_query_statistics to openlineage-event-listener.disabled-facets.

Requirements#

You need to perform the following steps:

  • Provide an HTTP/S service that accepts POST events with a JSON body and is compatible with the OpenLineage API format.

  • Configure openlineage-event-listener.transport.url in the event listener properties file with the URI of the service

  • Configure openlineage-event-listener.trino.uri so proper OpenLineage job namespace is render within produced events. Needs to be proper uri with scheme, host and port (otherwise plugin will fail to start).

  • Configure what events to send as detailed in Configuration

Configuration#

To configure the OpenLineage event listener, create an event listener properties file in etc named starburst-open-lineage-event-listener.properties with the following contents as an example of minimal required configuration:

event-listener.name=starburst-open-lineage
openlineage-event-listener.trino.uri=<Address of your Trino coordinator>

Add etc/starburst-open-lineage-event-listener.properties to event-listener.config-files in Config properties:

event-listener.config-files=etc/starburst-open-lineage-event-listener.properties,...
OpenLineage event listener configuration properties#

Property name

Description

Default

openlineage-event-listener.transport.type

Type of transport to use when emitting lineage information. See Supported Transport Types for list of available options with descriptions.

NOOP

openlineage-event-listener.trino.uri

Trino hostname. Used to render Job Namespace in OpenLineage. Required.

None

openlineage-event-listener.trino.include-query-types

Which types of queries should be taken into account when emitting lineage information. List of values split by comma. Each value must be matching io.trino.spi.resourcegroups.QueryType enum. Query types not included here are filtered out.

DELETE,INSERT,MERGE,UPDATE,ALTER_TABLE_EXECUTE

openlineage-event-listener.disabled-facets

Which Available Trino Facets should be not included in final OpenLineage event. Allowed values: trino_metadata, trino_query_context, trino_query_statistics.

None

openlineage-event-listener.namespace

Custom namespace to be used for Job namespace attribute. If blank will default to Dataset Namespace.

None

Supported Transport Types#

  • NOOP - The default transport type. Does not do work or transfer data.

  • CONSOLE - sends OpenLineage JSON event to Trino coordinator standard output.

  • HTTP - sends OpenLineage JSON event to OpenLineage compatible HTTP endpoint.

OpenLineage HTTP Transport Configuration properties#

Property name

Description

Default

openlineage-event-listener.transport.url

URL of OpenLineage . Required if HTTP transport is configured.

None

openlineage-event-listener.transport.endpoint

Custom path for OpenLineage compatible endpoint. If configured, there cannot be any custom path within openlineage-event-listener.transport.url.

/api/v1

openlineage-event-listener.transport.api-key

API key (string value) used to authenticate with the service. at openlineage-event-listener.transport.url.

None

openlineage-event-listener.transport.timeout

Timeout when making HTTP Requests.

5000ms

openlineage-event-listener.transport.headers

List of custom HTTP headers to be sent along with the events. See Custom HTTP headers for more details.

Empty

openlineage-event-listener.transport.url-params

List of custom url params to be added to final HTTP Request. See Custom URL Params for more details.

Empty

Custom HTTP headers#

Providing custom HTTP headers is a useful mechanism for sending metadata along with event messages.

Providing headers follows the pattern of key:value pairs separated by commas:

openlineage-event-listener.transport.headers="Header-Name-1:header value 1,Header-Value-2:header value 2,..."

If you need to use a comma(,) or colon(:) in a header name or value, escape it using a backslash (\).

Keep in mind that these are static, so they can not carry information taken from the event itself.

Custom URL Params#

Providing additional URL Params included in final HTTP Request.

Providing url params follows the pattern of key:value pairs separated by commas:

openlineage-event-listener.transport.url-params="Param-Name-1:param value 1,Param-Value-2:param value 2,..."

Keep in mind that these are static, so they can not carry information taken from the event itself.