Skip to main content
Version: 1.23.0

1.17.1 - 2024-06-21

Added

  • Java: dataset namespace resolver feature #2720 @pawel-big-lebowski
    Adds a dataset namespace resolving mechanism that resolves dataset namespaces based on the resolvers configured. The core mechanism is implemented in openlineage-java and can be used within the Flink and Spark integrations.
  • Spark: add transformation extraction #2758 @tnazarew
    Adds a transformation type extraction mechanism.
  • Spark: add GCP run and job facets #2643 @codelixir
    Adds GCPRunFacetBuilder and GCPJobFacetBuilder to report additional facets when running on Google Cloud Platform.
  • Spark: improve namespace format for SQLServer #2773 @dolfinus
    Improves the namespace format for SQLServer.
  • Spark: verify jar content after build #2698 @pawel-big-lebowski
    Adds a tool to verify shadowJar content and prevent reported issues. These are hard to prevent currently and require manual verification of manually unpacked jar content.
  • Spec: add transformation type info #2756 @tnazarew
    Adds information about the transformation type in ColumnLineageDatasetFacet. transformationType and transformationDescription are marked as deprecated.
  • Spec: implementing facet registry (following #2161) #2729 @harels
    Introduces the foundations of the new facet Registry into the repo.
  • Spec: register GCP common job facet #2740 @ngorchakova
    Registers the GCP job facet that contains common attributes that will improve the way lineage is parsed and displayed by the GCP platform. Based on the proposal, GCP Lineage would like to define facets that are expected from integrations. The list of support facets is not final and will be extended further by next PR.

Removed

  • Java: remove deprecated localServerId option from Kafka config #2738 @dolfinus
    Removes localServerId from Kafka config, deprecated since 1.13.0.
  • Java: remove deprecated Transport.emit(String) #2737 @dolfinus
    Removes Transport.emit(String) support, deprecated since 1.13.0.
  • Spark: remove spark-interfaces-scala module #2781@ddebowczyk92
    Replaces the existing spark-interfaces-scala interfaces with new ones decoupled from the Scala binary version. Allows for improved integration in environments where one cannot guarantee the same version of openlineage-java.

Changed

  • Spark: add log info when emitting lineage from Spark (following #2650) #2769 @algorithmy1
    Enhances logging.

Fixed

  • Flink: use namespace.name as Avro complex field type #2763 @dolfinus
    namespace.name is now used as Avro "type" of complex fields (record, enum, fixed).
  • Java: repair empty dataset name #2776 @kacpermuda
    The dataset name should not be empty.
  • Spark: fix events emitted for drop table for Spark 3.4 and above #2745 @pawel-big-lebowski@savannavalgi
    Includes dataset being dropped within the event, as it used to be prior to Spark 3.4.
  • Spark, Flink: fix S3 dataset names #2782 @dolfinus
    Drops the leading slash from the object storage dataset name. Converts s3a:// and s3n:// schemes to s3://.
  • Spark: fix Hive metastore namespace #2761 @dolfinus
    Fixes the dataset namespace for cases when the Hive metastore URL is set using $SPARK_CONF_DIR/hive-site.xml.
  • Spark: fix NPE in column-level lineage #2749 @pawel-big-lebowski
    The Spark agent now checks to determine if cur.getDependencies() is not null before adding dependencies.
  • Spark: refactor OpenLineageRunEventBuilder #2754 @pawel-big-lebowski
    Adds a separate class containing all the input arguments to call OpenLineageRunEventBuilder::buildRun.
  • Spark: fix historyUrl format #2741 @dolfinus
    Fixes the historyUrl format in spark_applicationDetails.
  • SQL: allow self-recursive aliases #2753 @mobuchowski
    Expressions like select * from test_orders as test_orders are now parsed properly.