1.13.1 - 2024-04-26
Added
- Java: allow timeout for circuit breakers
#2609@pawel-big-lebowski
Extends the circuit breaker mechanism to contain a global timeout that stops running OpenLineage integration code when a specified amount of time has elapsed. - Java: handle
DataSetEventandJobEventinTransport.emit#2611@dolfinus
Adds overloadsTransport.emit(OpenLineage.DatasetEvent)andTransport.emit(OpenLineage.JobEvent), reusing the implementation ofTransport.emit(OpenLineage.RunEvent). Please note:Transport.emit(String)is now deprecated and will be removed in 1.16.0. - Java/Python: add
GZIPcompression toHttpTransport#2603#2604@dolfinus
Adds acompressionoption toHttpTransportconfig in the Java and Python clients, withgzipimplementation. - Java/Python/Proxy: properly set Kafka message key
#2571#2597#2598@dolfinus
Adds a newmessageKeyoption toKafkaTransportconfig in the Python and Java clients, as well as the Proxy. This option replaces thelocalServerIdoption, which is now deprecated. Default value is generated using the run id (forRunEvent), job name (forJobEvent) or dataset name (forDatasetEvent). This value is used by the Kafka producer to distribute messages along topic partitions, instead of sending all the events to the same partition. This allows for full utilization of Kafka performance advantages. - Flink: add support for Micrometer metrics
#2633@mobuchowski Adds a mechanism for forwarding metrics to any Micrometer-compatible implementation for Flink as has been implemented for Spark. Included:MeterRegistry,CompositeMeterRegistry,SimpleMeterRegistry, andMicrometerProvider. - Python: generate Python facets from JSON schemas
#2520@JDarDagran
Objects specified with JSON Schema needed to be manually developed and checked in Python, leading to many discrepancies, including wrong schema URLs. This adds adatamodel-code-generatorfor parsing JSON Schema and generating Pydantic or dataclasses classes, etc. In order to useattrs(a more modern version of dataclasses) and overcome some limitations of the tool, a number of steps have been added in order to customize code to meet OpenLineage requirements. Included: updated references to the latest base JSON Schema spec for all child facets. Please note: newly generated code creates a v2 interface that will be implemented in existing integrations in a future release. The v2 interface introduces some breaking changes: facets are put into separate modules per JSON Schema spec file, some names are changed, and several classes are nowkw_only. - Spark/Flink/Java: support YAML config files together with SparkConf/FlinkConf
#2583@pawel-big-lebowski
Creates aSparkOpenlineageConfigandFlinkOpenlineageConfigfor a more uniform configuration experience for the user. RenamesOpenLineageYamltoOpenLineageConfigand modifies the code to use onlyOpenLineageConfigclasses. Includes a doc update to mention that both ways can be used interchangeably and final documentation will merge all values provided. - Spark: add custom token provider support
#2613@tnazarew Adds aTokenProviderTypeIdResolverto handle bothFQCNand (for backward compatibility)api_keytypes inspark.openlineage.transport.auth.type. - Spark/Flink: job ownership facet
#2533@pawel-big-lebowski Enables configuration entries specifying ownership of the job that will result in anOwnershipJobFacetbeing attached to job facets.
Changed
- Java: sync Kinesis
partitionKeyformat with Kafka implementation#2620@dolfinus Changes the format of KinesispartitionKeyfrom{jobNamespace}:{jobName}torun:{jobNamespace}/{jobName}to match the Kafka transport implementation.
Fixed
- Python: make
load_configreturn an empty dict instead ofNonewhen file empty#2596@kacpermudautils.load_config()now returns an empty dict instead ofNonein the case of an empty file to prevent anOpenLineageClientcrash. - Java: render lombok-generated methods in javadoc
#2614@dolfinus Fixes rendering of javadoc for methods generated bylombokannotations by adding adelombokstep. - Spark/Snowflake: parse NPE when query option is used and table is empty
#2599@mobuchowski
Fixes NPE when using query option when reading from Snowflake.