1.11.3 - 2024-04-04
Added
- Common: add support for
SCRIPT-type jobs in BigQuery#2564@kacpermuda
In the case ofSCRIPT-type jobs in BigQuery, no lineage was being extracted because theSCRIPTjob had no lineage information - it only spawned child jobs that had that information. With this change, the integration extracts lineage information from child jobs when dealing withSCRIPT-type jobs. - Spark: support for built-in lineage extraction
#2272@pawel-big-lebowski
This PR adds aspark-interfaces-scalapackage that allows lineage extraction to be implemented within Spark extensions (Iceberg, Delta, GCS, etc.). The Openlineage integration, when traversing the query plan, verifies if nodes implement defined interfaces. If so, interface methods are used to extract lineage. Refer to the README for more details. - Spark/Java: add support for Micrometer metrics
#2496@mobuchowski
Adds a mechanism for forwarding metrics to any Micrometer-compatible implementation. Included:MeterRegistryyFactory,MicrometerProvider,StatsDMetricsBuilder, metrics config in OpenLineage config, and a Java client implementation. - Spark: add support for telemetry mechanism
#2528@mobuchowski
Adds timers, counters and additional instrumentation in order to implement Micrometer metrics collection. - Spark: support query option on table read
#2556@mobuchowski
Adds support for the Spark-BigQuery connector's query input option, which executes a query directly on BigQuery, storing the result in an intermediate dataset, bypassing Spark's computation layer. Due to this, the lineage is retrieved using the SQL parser, similarly toJDBCRelation. - Spark: change
SparkPropertyFacetBuilderto support recording Spark runtime#2523@Ruihua98
ModifiesSparkPropertyFacetBuilderto capture theRuntimeConfigof the Spark session because the existingSparkPropertyFacetcan only capture the static config of the Spark context. This facet will be added in both RDD-related and SQL-related runs. - Spec: add
fileCountto dataset stat facets#2562@dolfinus
Adds afileCountfield toDataQualityMetricsInputDatasetFacetandOutputStatisticsOutputDatasetFacetspecification.
Fixed
- dbt:
dbt-olshould transparently exit with the same exit code as the childdbtprocess#2560@blacklight
Makesdbt-oltransparently exit with the same exit code as the childdbtprocess. - Flink: disable module metadata generation
#2531@HuangZhenQiu
Disables the module metadata generation for Flink to fix the problem of having gradle dependencies to submodules withinopenlineage-flink.jar. - Flink: fixes to version 1.19
#2507@pawel-big-lebowski
Fixes the class not found issue when checking for Cassandra classes. Also fixes the Maven pom dependency on subprojects. - Python: small improvements to
.emit()method logging & annotations#2539@dolfinus
Updates OpenLineage.emit debug messages and annotations. - SQL: show error message when OpenLineageSql cannot find native library
#2547@dolfinus
When theOpenLineageSqlclass could not load a native library, if returnedNonefor all operations. But because the error message was suppressed, the user could not determine the reason. - SQL: update code to conform to upstream sqlparser-rs changes
#2510@mobuchowski
Includes tests and cosmetic improvements. - Spark: fix access to active Spark session
#2535@pawel-big-lebowski
Changes behavior soIllegalStateExceptionis always caught when accessingSparkSession. - Spark: fix Databricks environment
#2537@pawel-big-lebowski
Fixes theClassNotFoundErroroccurring on Databricks runtime and extends the integration test to verifyDatabricksEnvironmentFacet. - Spark: fixed memory leak in JobMetricsHolder
#2565@d-m-h TheJobMetricsHolder#cleanUp(int)method now correctly purges unneeded state from both maps. - Spark: fixed memory leak in
UnknownEntryFacetListener#2557@pawel-big-lebowski Prevents storing the state when a facet is disabled, purging the state after populating run facets. - Spark: fix parsing
JDBCOptions(table=...)containing subquery#2546@dolfinus
Preventsopenlineage-sparkfrom producing datasets with names likedatabase.(select * from table)for JDBC sources. - Spark/Snowflake: support query option via SQL parser
#2563@mobuchowski
When a Snowflake job is bypassing Spark's computation layer, now the SQL parser will be used to get the lineage. - Spark: always catch
IllegalStateExceptionwhen accessingSparkSession#2535@pawel-big-lebowski
IllegalStateExceptionwas not being caught.