1.11.3 - 2024-04-04
Added
- Common: add support for
SCRIPT
-type jobs in BigQuery#2564
@kacpermuda
In the case ofSCRIPT
-type jobs in BigQuery, no lineage was being extracted because theSCRIPT
job had no lineage information - it only spawned child jobs that had that information. With this change, the integration extracts lineage information from child jobs when dealing withSCRIPT
-type jobs. - Spark: support for built-in lineage extraction
#2272
@pawel-big-lebowski
This PR adds aspark-interfaces-scala
package that allows lineage extraction to be implemented within Spark extensions (Iceberg, Delta, GCS, etc.). The Openlineage integration, when traversing the query plan, verifies if nodes implement defined interfaces. If so, interface methods are used to extract lineage. Refer to the README for more details. - Spark/Java: add support for Micrometer metrics
#2496
@mobuchowski
Adds a mechanism for forwarding metrics to any Micrometer-compatible implementation. Included:MeterRegistryyFactory
,MicrometerProvider
,StatsDMetricsBuilder
, metrics config in OpenLineage config, and a Java client implementation. - Spark: add support for telemetry mechanism
#2528
@mobuchowski
Adds timers, counters and additional instrumentation in order to implement Micrometer metrics collection. - Spark: support query option on table read
#2556
@mobuchowski
Adds support for the Spark-BigQuery connector's query input option, which executes a query directly on BigQuery, storing the result in an intermediate dataset, bypassing Spark's computation layer. Due to this, the lineage is retrieved using the SQL parser, similarly toJDBCRelation
. - Spark: change
SparkPropertyFacetBuilder
to support recording Spark runtime#2523
@Ruihua98
ModifiesSparkPropertyFacetBuilder
to capture theRuntimeConfig
of the Spark session because the existingSparkPropertyFacet
can only capture the static config of the Spark context. This facet will be added in both RDD-related and SQL-related runs. - Spec: add
fileCount
to dataset stat facets#2562
@dolfinus
Adds afileCount
field toDataQualityMetricsInputDatasetFacet
andOutputStatisticsOutputDatasetFacet
specification.
Fixed
- dbt:
dbt-ol
should transparently exit with the same exit code as the childdbt
process#2560
@blacklight
Makesdbt-ol
transparently exit with the same exit code as the childdbt
process. - Flink: disable module metadata generation
#2531
@HuangZhenQiu
Disables the module metadata generation for Flink to fix the problem of having gradle dependencies to submodules withinopenlineage-flink.jar
. - Flink: fixes to version 1.19
#2507
@pawel-big-lebowski
Fixes the class not found issue when checking for Cassandra classes. Also fixes the Maven pom dependency on subprojects. - Python: small improvements to
.emit()
method logging & annotations#2539
@dolfinus
Updates OpenLineage.emit debug messages and annotations. - SQL: show error message when OpenLineageSql cannot find native library
#2547
@dolfinus
When theOpenLineageSql
class could not load a native library, if returnedNone
for all operations. But because the error message was suppressed, the user could not determine the reason. - SQL: update code to conform to upstream sqlparser-rs changes
#2510
@mobuchowski
Includes tests and cosmetic improvements. - Spark: fix access to active Spark session
#2535
@pawel-big-lebowski
Changes behavior soIllegalStateException
is always caught when accessingSparkSession
. - Spark: fix Databricks environment
#2537
@pawel-big-lebowski
Fixes theClassNotFoundError
occurring on Databricks runtime and extends the integration test to verifyDatabricksEnvironmentFacet
. - Spark: fixed memory leak in JobMetricsHolder
#2565
@d-m-h TheJobMetricsHolder#cleanUp(int)
method now correctly purges unneeded state from both maps. - Spark: fixed memory leak in
UnknownEntryFacetListener
#2557
@pawel-big-lebowski Prevents storing the state when a facet is disabled, purging the state after populating run facets. - Spark: fix parsing
JDBCOptions(table=...)
containing subquery#2546
@dolfinus
Preventsopenlineage-spark
from producing datasets with names likedatabase.(select * from table)
for JDBC sources. - Spark/Snowflake: support query option via SQL parser
#2563
@mobuchowski
When a Snowflake job is bypassing Spark's computation layer, now the SQL parser will be used to get the lineage. - Spark: always catch
IllegalStateException
when accessingSparkSession
#2535
@pawel-big-lebowski
IllegalStateException
was not being caught.