Skip to main content
Version: Next

1.42.0 - 2026-01-07

Added

  • DataZone transport: Add cross-region support #4218 @RohithKayathi Enable posting lineage events to DataZone domains in different regions from where data transformation jobs run.
  • Spark: Add config for disabling RDD event emitting #4118 @kchledowski Add new configuration option spark.openlineage.filter.rddEventsDisabled to selectively disable OpenLineage event emission for RDD operations while keeping SQL-based operations enabled.
  • Spark: Add schema and CLL facets for Snowflake writes #4124 @kchledowski Add schema and column-level lineage support for Snowflake datasets when using the Spark-Snowflake connector.
  • Spark: Add spark.openlineage.applicationRunId override #4215 @wslulciuc Add support to override the application runID via the property spark.openlineage.applicationRunId.
  • Spec: Add ExecutionParametersRunFacet #4182 @jakub-moravec Add a new facet to capture input parameters supplied to a job at the time of execution, enabling reproducibility, debugging, and richer lineage context.

Changed

  • Java: Update GCP Lineage transport version and fix dependency shading #3768 @tnazarew Update GCP Lineage transport to use new version of the producer library with fixed dependency shading.
  • Python: Show what transport failed to create #4220 @mobuchowski Improve error messages to indicate which transport failed to create.
  • Spark: Prevent classloader issue by gating log behind additional flag #4207 @mobuchowski Fix classloader conflicts with BigQuery connector by gating DEBUG toJSON() logging behind an additional flag and logging exceptions.

Fixed

  • Python: Fix .with_additional_properties() annotation #4197 @dolfinus Fix type annotation for .with_additional_properties() method to correctly accept keyword arguments.
  • Spark: Fix BigQuery symlinks with ".db" suffix #4192 @kchledowski Fix BigQuery symlink namespace incorrectly having ".db" suffix in RUNNING and COMPLETE events by avoiding mutation of the Identifier object.
  • Spark: Fix Glue Data Catalog detection in YARN cluster mode #4229 @lawofcycles Add fallback mechanism to retrieve AWS region from EC2 Instance Metadata Service when environment variables are unavailable in YARN cluster mode.
  • Spark: Fix missing inputs and CLL for AWS DynamicFrame #4222 @kchledowski Fix missing inputs and column-level lineage when writing from AWS DynamicFrame by treating NewHadoopRDD as file-like.
  • Spark: Remove path pattern in ColumnLineageFacet as well #4228 @RohithKayathi Apply spark.openlineage.dataset.removePath.pattern to input field names in ColumnLineageFacet, and fix hashCode/equals methods to include additionalProperties.

Removed

  • Airflow: Remove Airflow integration from OpenLineage repository #4212 @kacpermuda The deprecated Airflow integration has been removed from the OpenLineage repository.