Java: allow timeout for circuit breakers#2609 @pawel-big-lebowski Extends the circuit breaker mechanism to contain a global timeout that stops running OpenLineage integration code when a specified amount of time has elapsed.
Java: handle DataSetEvent and JobEvent in Transport.emit#2611 @dolfinus Adds overloads Transport.emit(OpenLineage.DatasetEvent) and Transport.emit(OpenLineage.JobEvent), reusing the implementation of Transport.emit(OpenLineage.RunEvent). Please note: Transport.emit(String) is now deprecated and will be removed in 1.16.0.
Java/Python: add GZIP compression to HttpTransport#2603#2604 @dolfinus Adds a compression option to HttpTransport config in the Java and Python clients, with gzip implementation.
Java/Python/Proxy: properly set Kafka message key#2571#2597#2598 @dolfinus Adds a new messageKey option to KafkaTransport config in the Python and Java clients, as well as the Proxy. This option replaces the localServerId option, which is now deprecated. Default value is generated using the run id (for RunEvent), job name (for JobEvent) or dataset name (for DatasetEvent). This value is used by the Kafka producer to distribute messages along topic partitions, instead of sending all the events to the same partition. This allows for full utilization of Kafka performance advantages.
Flink: add support for Micrometer metrics#2633 @mobuchowski
Adds a mechanism for forwarding metrics to any Micrometer-compatible implementation for Flink as has been implemented for Spark. Included: MeterRegistry, CompositeMeterRegistry, SimpleMeterRegistry, and MicrometerProvider.
Python: generate Python facets from JSON schemas#2520 @JDarDagran Objects specified with JSON Schema needed to be manually developed and checked in Python, leading to many discrepancies, including wrong schema URLs. This adds a datamodel-code-generator for parsing JSON Schema and generating Pydantic or dataclasses classes, etc. In order to use attrs (a more modern version of dataclasses) and overcome some limitations of the tool, a number of steps have been added in order to customize code to meet OpenLineage requirements. Included: updated references to the latest base JSON Schema spec for all child facets. Please note: newly generated code creates a v2 interface that will be implemented in existing integrations in a future release. The v2 interface introduces some breaking changes: facets are put into separate modules per JSON Schema spec file, some names are changed, and several classes are now kw_only.
Spark/Flink/Java: support YAML config files together with SparkConf/FlinkConf#2583 @pawel-big-lebowski Creates a SparkOpenlineageConfig and FlinkOpenlineageConfig for a more uniform configuration experience for the user. Renames OpenLineageYaml to OpenLineageConfig and modifies the code to use only OpenLineageConfig classes. Includes a doc update to mention that both ways can be used interchangeably and final documentation will merge all values provided.
Spark: add custom token provider support#2613 @tnazarew
Adds a TokenProviderTypeIdResolver to handle both FQCN and (for backward compatibility) api_key types in spark.openlineage.transport.auth.type.
Spark/Flink: job ownership facet#2533 @pawel-big-lebowski
Enables configuration entries specifying ownership of the job that will result in an OwnershipJobFacet being attached to job facets.
Java: sync Kinesis partitionKey format with Kafka implementation#2620 @dolfinus
Changes the format of Kinesis partitionKey from {jobNamespace}:{jobName} to run:{jobNamespace}/{jobName} to match the Kafka transport implementation.
Python: make load_config return an empty dict instead of None when file empty#2596 @kacpermuda
utils.load_config() now returns an empty dict instead of None in the case of an empty file to prevent an OpenLineageClient crash.
Java: render lombok-generated methods in javadoc#2614 @dolfinus
Fixes rendering of javadoc for methods generated by lombok annotations by adding a delombok step.
Spark/Snowflake: parse NPE when query option is used and table is empty#2599 @mobuchowski Fixes NPE when using query option when reading from Snowflake.