1.13.1 - 2024-04-26
Added
- Java: allow timeout for circuit breakers
#2609
@pawel-big-lebowski
Extends the circuit breaker mechanism to contain a global timeout that stops running OpenLineage integration code when a specified amount of time has elapsed. - Java: handle
DataSetEvent
andJobEvent
inTransport.emit
#2611
@dolfinus
Adds overloadsTransport.emit(OpenLineage.DatasetEvent)
andTransport.emit(OpenLineage.JobEvent)
, reusing the implementation ofTransport.emit(OpenLineage.RunEvent)
. Please note:Transport.emit(String)
is now deprecated and will be removed in 1.16.0. - Java/Python: add
GZIP
compression toHttpTransport
#2603
#2604
@dolfinus
Adds acompression
option toHttpTransport
config in the Java and Python clients, withgzip
implementation. - Java/Python/Proxy: properly set Kafka message key
#2571
#2597
#2598
@dolfinus
Adds a newmessageKey
option toKafkaTransport
config in the Python and Java clients, as well as the Proxy. This option replaces thelocalServerId
option, which is now deprecated. Default value is generated using the run id (forRunEvent
), job name (forJobEvent
) or dataset name (forDatasetEvent
). This value is used by the Kafka producer to distribute messages along topic partitions, instead of sending all the events to the same partition. This allows for full utilization of Kafka performance advantages. - Flink: add support for Micrometer metrics
#2633
@mobuchowski Adds a mechanism for forwarding metrics to any Micrometer-compatible implementation for Flink as has been implemented for Spark. Included:MeterRegistry
,CompositeMeterRegistry
,SimpleMeterRegistry
, andMicrometerProvider
. - Python: generate Python facets from JSON schemas
#2520
@JDarDagran
Objects specified with JSON Schema needed to be manually developed and checked in Python, leading to many discrepancies, including wrong schema URLs. This adds adatamodel-code-generator
for parsing JSON Schema and generating Pydantic or dataclasses classes, etc. In order to useattrs
(a more modern version of dataclasses) and overcome some limitations of the tool, a number of steps have been added in order to customize code to meet OpenLineage requirements. Included: updated references to the latest base JSON Schema spec for all child facets. Please note: newly generated code creates a v2 interface that will be implemented in existing integrations in a future release. The v2 interface introduces some breaking changes: facets are put into separate modules per JSON Schema spec file, some names are changed, and several classes are nowkw_only
. - Spark/Flink/Java: support YAML config files together with SparkConf/FlinkConf
#2583
@pawel-big-lebowski
Creates aSparkOpenlineageConfig
andFlinkOpenlineageConfig
for a more uniform configuration experience for the user. RenamesOpenLineageYaml
toOpenLineageConfig
and modifies the code to use onlyOpenLineageConfig
classes. Includes a doc update to mention that both ways can be used interchangeably and final documentation will merge all values provided. - Spark: add custom token provider support
#2613
@tnazarew Adds aTokenProviderTypeIdResolver
to handle bothFQCN
and (for backward compatibility)api_key
types inspark.openlineage.transport.auth.type
. - Spark/Flink: job ownership facet
#2533
@pawel-big-lebowski Enables configuration entries specifying ownership of the job that will result in anOwnershipJobFacet
being attached to job facets.
Changed
- Java: sync Kinesis
partitionKey
format with Kafka implementation#2620
@dolfinus Changes the format of KinesispartitionKey
from{jobNamespace}:{jobName}
torun:{jobNamespace}/{jobName}
to match the Kafka transport implementation.
Fixed
- Python: make
load_config
return an empty dict instead ofNone
when file empty#2596
@kacpermudautils.load_config()
now returns an empty dict instead ofNone
in the case of an empty file to prevent anOpenLineageClient
crash. - Java: render lombok-generated methods in javadoc
#2614
@dolfinus Fixes rendering of javadoc for methods generated bylombok
annotations by adding adelombok
step. - Spark/Snowflake: parse NPE when query option is used and table is empty
#2599
@mobuchowski
Fixes NPE when using query option when reading from Snowflake.