trino
Allow OpenLineage job name customization
#25704
Open

Allow OpenLineage job name customization #25704

dolfinus
dolfinus24 days ago (edited 15 days ago)

Description

Using queryId as OpenLineage's job name field lead to registering all unique queries as ETL jobs. This is hardly useful in terms of lineage.

Now integration allows to customize name field. For example, it may use X-Trino-Source or X-Trino-User values instead, or a combination or both.

Additional context and related issues

As user can override only system or catalog properties via SET SESSION or X-Trino-Session, but OpenLineage integration is event listener, user cannot override job name using this mechanism. To avoid using static job names for all OpenLIneage events, config option openlineage-event-listener.job.nameFormat supports substitution variables:

  • $QUERY_ID - default for backward compatibility.
  • $USER
  • $SOURCE

In theory, user can connect to Trino and pass job name as X-Trino-Source, and setting openlineage-event-listener.job.nameFormat=$SOURCE will pass this value directly to OpenLineage job name.

But all clients have some default value for this field (e.g. trino-cli, trino-python-client, trino-jdbc). If user haven't set custom source name, then all OpenLineage events produced by the same client will got the same job name (e.g. trino-jdbc). To avoid this, it is possible to use multiple substitutions, like $SOURCE:$USER to produce jobs with names like trino-jdbc:dolfinus or trino-jdbc:myawesomeuser.

Closes: #25535

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text:

## OpenLineage
* Allow overriding OpenLineage job `name` ({issue}`25535`)
cla-bot cla-bot added cla-signed
github-actions github-actions added docs
dolfinus dolfinus force pushed from 0c7253b7 to 04413a37 24 days ago
dolfinus dolfinus marked this pull request as ready for review 24 days ago
ebyhr
ebyhr commented on 2025-04-30
Conversation is marked as resolved
Show resolved
plugin/trino-openlineage/src/main/java/io/trino/plugin/openlineage/OpenLineageListenerConfig.java
115 return jobNameFormat.orElse(DEFAULT_JOB_NAME_FORMAT);
116 }
117
118
@Config("openlineage-event-listener.job.nameFormat")
ebyhr24 days ago

Please avoid camel case in config names.

dolfinus23 days ago

Ok

Conversation is marked as resolved
Show resolved
plugin/trino-openlineage/src/main/java/io/trino/plugin/openlineage/OpenLineageListenerConfig.java
116 }
117
118 @Config("openlineage-event-listener.job.nameFormat")
119
@ConfigDescription("Set format for job name facet.")
ebyhr24 days ago (edited 24 days ago)

We don't end with a dot for a single sentence in ConfigDescription.

dolfinus23 days ago

Fixed, including all other ConfigDescription in this file

ebyhr23 days ago

Please separate a commit for other descriptions. It is unrelated to "Allow OpenLineage job name customization"

dolfinus23 days ago

Ok

dolfinus dolfinus force pushed from 04413a37 to fdf7dcdd 23 days ago
dolfinus dolfinus force pushed from fdf7dcdd to 667cf798 23 days ago
Praveen2112
Praveen2112 commented on 2025-05-02
plugin/trino-openlineage/src/main/java/io/trino/plugin/openlineage/OpenLineageListenerConfig.java
Praveen211221 days ago

Instead of creating Optional here, can we directly use DEFAULT_JOB_NAME_FORMAT as a default value.

dolfinus21 days ago (edited 21 days ago)

I thought about case like:

openLineageListenerConfig.setJobNameFormat(null)

which resets value to its default

plugin/trino-openlineage/src/main/java/io/trino/plugin/openlineage/OpenLineageListenerConfig.java
119 @ConfigDescription("Set format for job name facet")
120 public OpenLineageListenerConfig setJobNameFormat(String jobNameFormat)
121 {
122
this.jobNameFormat = Optional.ofNullable(jobNameFormat);
123
if (jobNameFormat == null) {
124
return this;
125
}
Praveen211221 days ago

Airlift would invoke pass only non nullable String.

dolfinus21 days ago (edited 21 days ago)

But class method could be called with null passed directly. Other methods above use Optional as well

plugin/trino-openlineage/src/main/java/io/trino/plugin/openlineage/OpenLineageListenerConfig.java
129 resolvedJobName = resolvedJobName.replace(substitution, "");
130 }
131
132
if (!resolvedJobName.contains("$")) {
133
return this;
134
}
Praveen211221 days ago

Can we use a regexp based validator here ?

Praveen211221 days ago

We also use a similar validation here - FormatBasedRemoteQueryModifierConfig

dolfinus21 days ago (edited 21 days ago)

I don't see there this method is called:

boolean isFormatValid()
{
return hasValidPlaceholders(format, SessionInterpolatedValues.values());
}

Is this some kind of airlift magic?

Regarding using SessionInterpolatedValues as an example for interpolation:
https://github.com/trinodb/trino/blob/master/lib/trino-plugin-toolkit/src/main/java/io/trino/plugin/base/logging/SessionInterpolatedValues.java#L31

I see an issue here - some fields are fetched from QueryContext, and others from QueryMetadata, but InterpolatedValue<T> generic accepts only one class, and Java doesn't have Union type...

Praveen211216 days ago

This is kind of Airlift magic i.e method with Assert annotation and which starts with is would be invoked as a part of validation

dolfinus16 days ago (edited 16 days ago)

Ok, used the same approach as for SessionInterpolatedValues. Added a custom class OpenLineageJobContext used only to pass QueryContext and QueryMetadata around.

plugin/trino-openlineage/src/test/java/io/trino/plugin/openlineage/TrinoEventData.java
68 Optional.empty(), // clientInfo
69 new HashSet<>(), // clientTags
70 new HashSet<>(), // clientCapabilities
62
Set.of("enabledRoles"),
63
Set.of("groups"),
64
Optional.of("traceToken"),
65
Optional.of("remoteClientAddress"),
66
Optional.of("userAgent"),
67
Optional.of("clientInfo"),
68
Set.of("clientTags"),
69
Set.of("clientCapabilities"),
Praveen211221 days ago

Do we need to change all the default values ?

dolfinus16 days ago

Reverted, but I don't see any issues with this change

chenjian2664
chenjian2664 commented on 2025-05-07
plugin/trino-openlineage/src/test/java/io/trino/plugin/openlineage/TestOpenLineageListenerConfig.java
82 void testUnsupportedJobNameFormatSubstitutions()
83 throws Exception
84 {
85
assertThatThrownBy(() -> new OpenLineageListenerConfig().setJobNameFormat("${unknown}"))
chenjian266416 days ago

Please add more invalid cases, for example ${xxx, ${${queryId}}, $-${source}.
Also does the fixed format i,e abc123 support ?

dolfinus16 days ago

Also does the fixed format i,e abc123 support ?

Yes

dolfinus16 days ago

Outdated as FormatInterpolator is now responsible for all validation

plugin/trino-openlineage/src/main/java/io/trino/plugin/openlineage/OpenLineageListener.java
chenjian266416 days ago

Will the jobNameFormat be null?

dolfinus16 days ago

It's not:

chenjian266416 days ago

Then let's add requireNonNull check on this

dolfinus15 days ago

I don't understand why this is needed

dolfinus dolfinus force pushed from 667cf798 to f5b5465b 16 days ago
dolfinus dolfinus force pushed from f5b5465b to 86e586a9 16 days ago
dolfinus dolfinus requested a review from Praveen2112 Praveen2112 16 days ago
dolfinus dolfinus requested a review from mgorsk1 mgorsk1 9 days ago
dolfinus Remove leading docs from OpenLineageListenerConfig description
a18fa6e2
dolfinus Make FormatInterpolator.InterpolatedValue interface public
61a6ca0c
dolfinus Allow OpenLineage job name customization
05392473
dolfinus dolfinus force pushed from 86e586a9 to 05392473 9 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone