Instead of creating Optional
here, can we directly use DEFAULT_JOB_NAME_FORMAT
as a default value.
I thought about case like:
openLineageListenerConfig.setJobNameFormat(null)
which resets value to its default
119 | @ConfigDescription("Set format for job name facet") | ||
120 | public OpenLineageListenerConfig setJobNameFormat(String jobNameFormat) | ||
121 | { | ||
122 | this.jobNameFormat = Optional.ofNullable(jobNameFormat); | ||
123 | if (jobNameFormat == null) { | ||
124 | return this; | ||
125 | } |
Airlift would invoke pass only non nullable String.
But class method could be called with null passed directly. Other methods above use Optional as well
129 | resolvedJobName = resolvedJobName.replace(substitution, ""); | ||
130 | } | ||
131 | |||
132 | if (!resolvedJobName.contains("$")) { | ||
133 | return this; | ||
134 | } |
Can we use a regexp based validator here ?
We also use a similar validation here - FormatBasedRemoteQueryModifierConfig
I don't see there this method is called:
Regarding using SessionInterpolatedValues as an example for interpolation:
https://github.com/trinodb/trino/blob/master/lib/trino-plugin-toolkit/src/main/java/io/trino/plugin/base/logging/SessionInterpolatedValues.java#L31
I see an issue here - some fields are fetched from QueryContext, and others from QueryMetadata, but InterpolatedValue<T>
generic accepts only one class, and Java doesn't have Union type...
This is kind of Airlift magic i.e method with Assert annotation and which starts with is
would be invoked as a part of validation
Ok, used the same approach as for SessionInterpolatedValues
. Added a custom class OpenLineageJobContext
used only to pass QueryContext and QueryMetadata around.
68 | Optional.empty(), // clientInfo | ||
69 | new HashSet<>(), // clientTags | ||
70 | new HashSet<>(), // clientCapabilities | ||
62 | Set.of("enabledRoles"), | ||
63 | Set.of("groups"), | ||
64 | Optional.of("traceToken"), | ||
65 | Optional.of("remoteClientAddress"), | ||
66 | Optional.of("userAgent"), | ||
67 | Optional.of("clientInfo"), | ||
68 | Set.of("clientTags"), | ||
69 | Set.of("clientCapabilities"), |
Do we need to change all the default values ?
Reverted, but I don't see any issues with this change
82 | void testUnsupportedJobNameFormatSubstitutions() | ||
83 | throws Exception | ||
84 | { | ||
85 | assertThatThrownBy(() -> new OpenLineageListenerConfig().setJobNameFormat("${unknown}")) |
Please add more invalid cases, for example ${xxx
, ${${queryId}}
, $-${source}
.
Also does the fixed format i,e abc123
support ?
Also does the fixed format i,e abc123 support ?
Yes
Outdated as FormatInterpolator
is now responsible for all validation
Will the jobNameFormat
be null?
Then let's add requireNonNull
check on this
I don't understand why this is needed
Login to write a write a comment.
Description
Using
queryId
as OpenLineage's jobname
field lead to registering all unique queries as ETL jobs. This is hardly useful in terms of lineage.Now integration allows to customize
name
field. For example, it may useX-Trino-Source
orX-Trino-User
values instead, or a combination or both.Additional context and related issues
As user can override only system or catalog properties via
SET SESSION
orX-Trino-Session
, but OpenLineage integration is event listener, user cannot override job name using this mechanism. To avoid using static job names for all OpenLIneage events, config optionopenlineage-event-listener.job.nameFormat
supports substitution variables:$QUERY_ID
- default for backward compatibility.$USER
$SOURCE
In theory, user can connect to Trino and pass job name as
X-Trino-Source
, and settingopenlineage-event-listener.job.nameFormat=$SOURCE
will pass this value directly to OpenLineage jobname
.But all clients have some default value for this field (e.g.
trino-cli
,trino-python-client
,trino-jdbc
). If user haven't set custom source name, then all OpenLineage events produced by the same client will got the same job name (e.g.trino-jdbc
). To avoid this, it is possible to use multiple substitutions, like$SOURCE:$USER
to produce jobs with names liketrino-jdbc:dolfinus
ortrino-jdbc:myawesomeuser
.Closes: #25535
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text: