Version: 1.21.1

1.11.3 - 2024-04-04

Added

Common: add support for SCRIPT-type jobs in BigQuery #2564 @kacpermuda
In the case of SCRIPT-type jobs in BigQuery, no lineage was being extracted because the SCRIPT job had no lineage information - it only spawned child jobs that had that information. With this change, the integration extracts lineage information from child jobs when dealing with SCRIPT-type jobs.
Spark: support for built-in lineage extraction #2272 @pawel-big-lebowski
This PR adds a spark-interfaces-scala package that allows lineage extraction to be implemented within Spark extensions (Iceberg, Delta, GCS, etc.). The Openlineage integration, when traversing the query plan, verifies if nodes implement defined interfaces. If so, interface methods are used to extract lineage. Refer to the README for more details.
Spark/Java: add support for Micrometer metrics #2496 @mobuchowski
Adds a mechanism for forwarding metrics to any Micrometer-compatible implementation. Included: MeterRegistryyFactory, MicrometerProvider, StatsDMetricsBuilder, metrics config in OpenLineage config, and a Java client implementation.
Spark: add support for telemetry mechanism #2528 @mobuchowski
Adds timers, counters and additional instrumentation in order to implement Micrometer metrics collection.
Spark: support query option on table read #2556 @mobuchowski
Adds support for the Spark-BigQuery connector's query input option, which executes a query directly on BigQuery, storing the result in an intermediate dataset, bypassing Spark's computation layer. Due to this, the lineage is retrieved using the SQL parser, similarly to JDBCRelation.
Spark: change SparkPropertyFacetBuilder to support recording Spark runtime #2523 @Ruihua98
Modifies SparkPropertyFacetBuilder to capture the RuntimeConfig of the Spark session because the existing SparkPropertyFacet can only capture the static config of the Spark context. This facet will be added in both RDD-related and SQL-related runs.
Spec: add fileCount to dataset stat facets #2562 @dolfinus
Adds a fileCount field to DataQualityMetricsInputDatasetFacet and OutputStatisticsOutputDatasetFacet specification.

Fixed

dbt: dbt-ol should transparently exit with the same exit code as the child dbt process #2560 @blacklight
Makes dbt-ol transparently exit with the same exit code as the child dbt process.
Flink: disable module metadata generation #2531 @HuangZhenQiu
Disables the module metadata generation for Flink to fix the problem of having gradle dependencies to submodules within openlineage-flink.jar.
Flink: fixes to version 1.19 #2507 @pawel-big-lebowski
Fixes the class not found issue when checking for Cassandra classes. Also fixes the Maven pom dependency on subprojects.
Python: small improvements to .emit() method logging & annotations #2539 @dolfinus
Updates OpenLineage.emit debug messages and annotations.
SQL: show error message when OpenLineageSql cannot find native library #2547 @dolfinus
When the OpenLineageSql class could not load a native library, if returned None for all operations. But because the error message was suppressed, the user could not determine the reason.
SQL: update code to conform to upstream sqlparser-rs changes #2510 @mobuchowski
Includes tests and cosmetic improvements.
Spark: fix access to active Spark session #2535 @pawel-big-lebowski
Changes behavior so IllegalStateException is always caught when accessing SparkSession.
Spark: fix Databricks environment #2537 @pawel-big-lebowski
Fixes the ClassNotFoundError occurring on Databricks runtime and extends the integration test to verify DatabricksEnvironmentFacet.
Spark: fixed memory leak in JobMetricsHolder #2565 @d-m-h The JobMetricsHolder#cleanUp(int) method now correctly purges unneeded state from both maps.
Spark: fixed memory leak in UnknownEntryFacetListener #2557 @pawel-big-lebowski Prevents storing the state when a facet is disabled, purging the state after populating run facets.
Spark: fix parsing JDBCOptions(table=...) containing subquery #2546 @dolfinus
Prevents openlineage-spark from producing datasets with names like database.(select * from table) for JDBC sources.
Spark/Snowflake: support query option via SQL parser #2563 @mobuchowski
When a Snowflake job is bypassing Spark's computation layer, now the SQL parser will be used to get the lineage.
Spark: always catch IllegalStateException when accessing SparkSession #2535 @pawel-big-lebowski
IllegalStateException was not being caught.

Added​

Fixed​

Added

Fixed