1.17.1 - 2024-06-21
Added
- Java: dataset namespace resolver feature
#2720
@pawel-big-lebowski
Adds a dataset namespace resolving mechanism that resolves dataset namespaces based on the resolvers configured. The core mechanism is implemented in openlineage-java and can be used within the Flink and Spark integrations. - Spark: add transformation extraction
#2758
@tnazarew
Adds a transformation type extraction mechanism. - Spark: add GCP run and job facets
#2643
@codelixir
AddsGCPRunFacetBuilder
andGCPJobFacetBuilder
to report additional facets when running on Google Cloud Platform. - Spark: improve namespace format for SQLServer
#2773
@dolfinus
Improves the namespace format for SQLServer. - Spark: verify jar content after build
#2698
@pawel-big-lebowski
Adds a tool to verifyshadowJar
content and prevent reported issues. These are hard to prevent currently and require manual verification of manually unpacked jar content. - Spec: add transformation type info
#2756
@tnazarew
Adds information about the transformation type inColumnLineageDatasetFacet
.transformationType
andtransformationDescription
are marked as deprecated. - Spec: implementing facet registry (following #2161)
#2729
@harels
Introduces the foundations of the new facet Registry into the repo. - Spec: register GCP common job facet
#2740
@ngorchakova
Registers the GCP job facet that contains common attributes that will improve the way lineage is parsed and displayed by the GCP platform. Based on the proposal, GCP Lineage would like to define facets that are expected from integrations. The list of support facets is not final and will be extended further by next PR.
Removed
- Java: remove deprecated
localServerId
option from Kafka config#2738
@dolfinus
RemoveslocalServerId
from Kafka config, deprecated since 1.13.0. - Java: remove deprecated
Transport.emit(String)
#2737
@dolfinus
RemovesTransport.emit(String)
support, deprecated since 1.13.0. - Spark: remove
spark-interfaces-scala
module#2781
@ddebowczyk92
Replaces the existingspark-interfaces-scala
interfaces with new ones decoupled from the Scala binary version. Allows for improved integration in environments where one cannot guarantee the same version ofopenlineage-java
.
Changed
- Spark: add log info when emitting lineage from Spark (following #2650)
#2769
@algorithmy1
Enhances logging.
Fixed
- Flink: use
namespace.name
as Avro complex field type#2763
@dolfinus
namespace.name
is now used as Avro"type"
of complex fields (record, enum, fixed). - Java: repair empty dataset name
#2776
@kacpermuda
The dataset name should not be empty. - Spark: fix events emitted for
drop table
for Spark 3.4 and above#2745
@pawel-big-lebowski@savannavalgi
Includes dataset being dropped within the event, as it used to be prior to Spark 3.4. - Spark, Flink: fix S3 dataset names
#2782
@dolfinus
Drops the leading slash from the object storage dataset name. Convertss3a://
ands3n://
schemes tos3://
. - Spark: fix Hive metastore namespace
#2761
@dolfinus
Fixes the dataset namespace for cases when the Hive metastore URL is set using$SPARK_CONF_DIR/hive-site.xml
. - Spark: fix NPE in column-level lineage
#2749
@pawel-big-lebowski
The Spark agent now checks to determine ifcur.getDependencies()
is not null before adding dependencies. - Spark: refactor
OpenLineageRunEventBuilder
#2754
@pawel-big-lebowski
Adds a separate class containing all the input arguments to callOpenLineageRunEventBuilder::buildRun
. - Spark: fix
historyUrl
format#2741
@dolfinus
Fixes thehistoryUrl
format inspark_applicationDetails
. - SQL: allow self-recursive aliases
#2753
@mobuchowski
Expressions likeselect * from test_orders as test_orders
are now parsed properly.