1.18.0 - 2024-07-11
Added
- Spark: configurable integration test
#2755
@pawel-big-lebowski
Provides command line tool capable of running Spark integration tests that can be created without Java. - Spark: OpenLineage Spark extension interfaces without runtime dependency hell
#2809
#2837
@ddebowczyk92
New Spark extension interfaces without runtime dependency hell. Includes a test to verify the integration is working properly. - Spark: support latest versions 3.4.3 and 3.5.1.
#2743
@pawel-big-lebowski
Upgrades CI workflows to run tests against latest Spark versions: 3.4.2 -> 3.4.3 and 3.5.0 -> 3.5.1. - Spark: add extraction of the masking property in column-level lineage
#2789
@tnazarew
Adds extraction of the masking property during collection of dependencies forColumnLineageDatasetFacet
creation. - Spark: collect table name from
InsertIntoHadoopFsRelationCommand
#2794
@dolfinus
Collects a table name forINSERT INTO
command for tables created withUSING $fileFormat
syntax, likeUSING orc
. - Spark, Flink: add
PostgresJdbcExtractor
#2806
@dolfinus
Adds the default5432
port to Postgres namespaces. - Spark, Flink: add
TeradataJdbcExtractor
#2826
@dolfinus
Converts JDBC URLs likejdbc:teradata/host/DBS_PORT=1024,DATABASE=somedb
to datasets with namespaceteradata://host:1024
and namesomedb.table
. - Spark, Flink: add
MySqlJdbcExtractor
#2825
@dolfinus
Handles different formats of MySQL JDBC URL, and produces datasets with consistent namespaces, likemysql://host:port
. - Spark, Flink: add
OracleJdbcExtractor
#2824
@dolfinus
Handles simple Oracle JDBC URLs, likeoracle:thin:@//host:port/serviceName
andoracle:thin@host:port:sid
, and converts each to a dataset with namespaceoracle://host:port
and namesid.schema.table
orserviceName.schema.table
. - Spark: configurable test with Docker image provided
#2822
@pawel-big-lebowski
Extends the configurable integration test feature to enable getting the Docker image name as a name. - Spark: Support Iceberg 1.4 on Spark 3.5.1.
#2838
@pawel-big-lebowski
Include Iceberg support for Spark 3.5. Fix column level lineage facet forUNION
queries. - Spec: add example for change in
#2756
#2801
@Sheeri
Updates thecustomLineage
facet test for the new syntax created in#2756
.
Changed
- Spark: fallback to
spark.sql.warehouse.dir
as table namespace#2767
@dolfinus
In cases when a metastore is not used, falls back tospark.sql.warehouse.dir
orhive.metastore.warehouse.dir
as table namespace, instead of duplicating the table's location.
Fixed
- Java: handle dashes in hostname for
JdbcExtractors
#2830
@dolfinus
Proper handling of dashes in JDBC URL hosts. - Spark: fix Glue symlinks formatting bug
#2807
@Akash2351
Fixes Glue symlinks with config parsing for Gluecatalogid
. - Spark, Flink: fix DBFS namespace format
#2800
@dolfinus
Fixes the DBFS namespace format. - Spark: fix Glue naming format
#2766
@dolfinus
Changes the AWS Glue namespace to match Glue ARN documentation. - Spark: fix Iceberg dataset location
#2797
@dolfinus
Fixes Iceberg dataset namespace: instead offile:/some/path/database.table
usesfile:/some/path/database/table
. For dataset TABLE symlink, uses warehouse location instead of database location. - Spark: fix NPE and incorrect comment
#2827
@pawel-big-lebowski
Fixes an error caused by a recent upgrade of Spark versions that did not break existing tests. - Spark: convert scheme and authority to lowercase in
JdbcLocation
#2831
@dolfinus
Converts valid JDBC URL scheme and authority to lowercase, leaving intact instance/database name, as different databases have different default case and case-sensitivity rules.