Skip to main content

1.11.3 - 2024-04-04

Added

  • Common: add support for SCRIPT-type jobs in BigQuery #2564 @kacpermuda
    In the case of SCRIPT-type jobs in BigQuery, no lineage was being extracted because the SCRIPT job had no lineage information - it only spawned child jobs that had that information. With this change, the integration extracts lineage information from child jobs when dealing with SCRIPT-type jobs.
  • Spark: support for built-in lineage extraction #2272 @pawel-big-lebowski
    This PR adds a spark-interfaces-scala package that allows lineage extraction to be implemented within Spark extensions (Iceberg, Delta, GCS, etc.). The Openlineage integration, when traversing the query plan, verifies if nodes implement defined interfaces. If so, interface methods are used to extract lineage. Refer to the README for more details.
  • Spark/Java: add support for Micrometer metrics #2496 @mobuchowski
    Adds a mechanism for forwarding metrics to any Micrometer-compatible implementation. Included: MeterRegistryyFactory, MicrometerProvider, StatsDMetricsBuilder, metrics config in OpenLineage config, and a Java client implementation.
  • Spark: add support for telemetry mechanism #2528 @mobuchowski
    Adds timers, counters and additional instrumentation in order to implement Micrometer metrics collection.
  • Spark: support query option on table read #2556 @mobuchowski
    Adds support for the Spark-BigQuery connector's query input option, which executes a query directly on BigQuery, storing the result in an intermediate dataset, bypassing Spark's computation layer. Due to this, the lineage is retrieved using the SQL parser, similarly to JDBCRelation.
  • Spark: change SparkPropertyFacetBuilder to support recording Spark runtime #2523 @Ruihua98
    Modifies SparkPropertyFacetBuilder to capture the RuntimeConfig of the Spark session because the existing SparkPropertyFacet can only capture the static config of the Spark context. This facet will be added in both RDD-related and SQL-related runs.
  • Spec: add fileCount to dataset stat facets #2562 @dolfinus
    Adds a fileCount field to DataQualityMetricsInputDatasetFacet and OutputStatisticsOutputDatasetFacet specification.

Fixed

  • dbt: dbt-ol should transparently exit with the same exit code as the child dbt process #2560 @blacklight
    Makes dbt-ol transparently exit with the same exit code as the child dbt process.
  • Flink: disable module metadata generation #2531 @HuangZhenQiu
    Disables the module metadata generation for Flink to fix the problem of having gradle dependencies to submodules within openlineage-flink.jar.
  • Flink: fixes to version 1.19 #2507 @pawel-big-lebowski
    Fixes the class not found issue when checking for Cassandra classes. Also fixes the Maven pom dependency on subprojects.
  • Python: small improvements to .emit() method logging & annotations #2539 @dolfinus
    Updates OpenLineage.emit debug messages and annotations.
  • SQL: show error message when OpenLineageSql cannot find native library #2547 @dolfinus
    When the OpenLineageSql class could not load a native library, if returned None for all operations. But because the error message was suppressed, the user could not determine the reason.
  • SQL: update code to conform to upstream sqlparser-rs changes #2510 @mobuchowski
    Includes tests and cosmetic improvements.
  • Spark: fix access to active Spark session #2535 @pawel-big-lebowski
    Changes behavior so IllegalStateException is always caught when accessing SparkSession.
  • Spark: fix Databricks environment #2537 @pawel-big-lebowski
    Fixes the ClassNotFoundError occurring on Databricks runtime and extends the integration test to verify DatabricksEnvironmentFacet.
  • Spark: fixed memory leak in JobMetricsHolder #2565 @d-m-hThe JobMetricsHolder#cleanUp(int) method now correctly purges unneeded state from both maps.
  • Spark: fixed memory leak in UnknownEntryFacetListener #2557 @pawel-big-lebowskiPrevents storing the state when a facet is disabled, purging the state after populating run facets.
  • Spark: fix parsing JDBCOptions(table=...) containing subquery #2546 @dolfinus
    Prevents openlineage-spark from producing datasets with names like `database.(select from table)` for JDBC sources.*
  • Spark/Snowflake: support query option via SQL parser #2563 @mobuchowski
    When a Snowflake job is bypassing Spark's computation layer, now the SQL parser will be used to get the lineage.
  • Spark: always catch IllegalStateException when accessing SparkSession #2535 @pawel-big-lebowski
    IllegalStateException was not being caught.