1.42.0 - 2026-01-07
Added
- DataZone transport: Add cross-region support
#4218@RohithKayathi Enable posting lineage events to DataZone domains in different regions from where data transformation jobs run. - Spark: Add config for disabling RDD event emitting
#4118@kchledowski Add new configuration optionspark.openlineage.filter.rddEventsDisabledto selectively disable OpenLineage event emission for RDD operations while keeping SQL-based operations enabled. - Spark: Add schema and CLL facets for Snowflake writes
#4124@kchledowski Add schema and column-level lineage support for Snowflake datasets when using the Spark-Snowflake connector. - Spark: Add spark.openlineage.applicationRunId override
#4215@wslulciuc Add support to override the application runID via the propertyspark.openlineage.applicationRunId. - Spec: Add ExecutionParametersRunFacet
#4182@jakub-moravec Add a new facet to capture input parameters supplied to a job at the time of execution, enabling reproducibility, debugging, and richer lineage context.
Changed
- Java: Update GCP Lineage transport version and fix dependency shading
#3768@tnazarew Update GCP Lineage transport to use new version of the producer library with fixed dependency shading. - Python: Show what transport failed to create
#4220@mobuchowski Improve error messages to indicate which transport failed to create. - Spark: Prevent classloader issue by gating log behind additional flag
#4207@mobuchowski Fix classloader conflicts with BigQuery connector by gating DEBUG toJSON() logging behind an additional flag and logging exceptions.
Fixed
- Python: Fix .with_additional_properties() annotation
#4197@dolfinus Fix type annotation for.with_additional_properties()method to correctly accept keyword arguments. - Spark: Fix BigQuery symlinks with ".db" suffix
#4192@kchledowski Fix BigQuery symlink namespace incorrectly having ".db" suffix in RUNNING and COMPLETE events by avoiding mutation of the Identifier object. - Spark: Fix Glue Data Catalog detection in YARN cluster mode
#4229@lawofcycles Add fallback mechanism to retrieve AWS region from EC2 Instance Metadata Service when environment variables are unavailable in YARN cluster mode. - Spark: Fix missing inputs and CLL for AWS DynamicFrame
#4222@kchledowski Fix missing inputs and column-level lineage when writing from AWS DynamicFrame by treating NewHadoopRDD as file-like. - Spark: Remove path pattern in ColumnLineageFacet as well
#4228@RohithKayathi Applyspark.openlineage.dataset.removePath.patternto input field names in ColumnLineageFacet, and fix hashCode/equals methods to include additionalProperties.
Removed
- Airflow: Remove Airflow integration from OpenLineage repository
#4212@kacpermuda The deprecated Airflow integration has been removed from the OpenLineage repository.