Java: dataset namespace resolver feature#2720@pawel-big-lebowski Adds a dataset namespace resolving mechanism that resolves dataset namespaces based on the resolvers configured. The core mechanism is implemented in openlineage-java and can be used within the Flink and Spark integrations.
Spark: add transformation extraction#2758@tnazarew Adds a transformation type extraction mechanism.
Spark: add GCP run and job facets#2643@codelixir Adds GCPRunFacetBuilder and GCPJobFacetBuilder to report additional facets when running on Google Cloud Platform.
Spark: improve namespace format for SQLServer#2773@dolfinus Improves the namespace format for SQLServer.
Spark: verify jar content after build#2698@pawel-big-lebowski Adds a tool to verify shadowJar content and prevent reported issues. These are hard to prevent currently and require manual verification of manually unpacked jar content.
Spec: add transformation type info#2756@tnazarew Adds information about the transformation type in ColumnLineageDatasetFacet. transformationType and transformationDescription are marked as deprecated.
Spec: implementing facet registry (following #2161)#2729@harels Introduces the foundations of the new facet Registry into the repo.
Spec: register GCP common job facet#2740@ngorchakova Registers the GCP job facet that contains common attributes that will improve the way lineage is parsed and displayed by the GCP platform. Based on the proposal, GCP Lineage would like to define facets that are expected from integrations. The list of support facets is not final and will be extended further by next PR.
Java: remove deprecated localServerId option from Kafka config#2738@dolfinus Removes localServerId from Kafka config, deprecated since 1.13.0.
Java: remove deprecated Transport.emit(String)#2737@dolfinus Removes Transport.emit(String) support, deprecated since 1.13.0.
Spark: remove spark-interfaces-scala module#2781@ddebowczyk92 Replaces the existing spark-interfaces-scala interfaces with new ones decoupled from the Scala binary version. Allows for improved integration in environments where one cannot guarantee the same version of openlineage-java.
Flink: use namespace.name as Avro complex field type#2763@dolfinus namespace.name is now used as Avro "type" of complex fields (record, enum, fixed).
Java: repair empty dataset name#2776@kacpermuda The dataset name should not be empty.
Spark: fix events emitted for drop table for Spark 3.4 and above#2745@pawel-big-lebowski@savannavalgi Includes dataset being dropped within the event, as it used to be prior to Spark 3.4.
Spark, Flink: fix S3 dataset names#2782@dolfinus Drops the leading slash from the object storage dataset name. Converts s3a:// and s3n:// schemes to s3://.
Spark: fix Hive metastore namespace#2761@dolfinus Fixes the dataset namespace for cases when the Hive metastore URL is set using $SPARK_CONF_DIR/hive-site.xml.
Spark: fix NPE in column-level lineage#2749@pawel-big-lebowski The Spark agent now checks to determine if cur.getDependencies() is not null before adding dependencies.
Spark: refactor OpenLineageRunEventBuilder#2754@pawel-big-lebowski Adds a separate class containing all the input arguments to call OpenLineageRunEventBuilder::buildRun.
Spark: fix historyUrl format#2741@dolfinus Fixes the historyUrl format in spark_applicationDetails.
SQL: allow self-recursive aliases#2753@mobuchowski Expressions like select * from test_orders as test_orders are now parsed properly.