Version: Next

1.18.0 - 2024-07-11

Added

Spark: configurable integration test #2755 @pawel-big-lebowski
Provides command line tool capable of running Spark integration tests that can be created without Java.
Spark: OpenLineage Spark extension interfaces without runtime dependency hell #2809 #2837 @ddebowczyk92
New Spark extension interfaces without runtime dependency hell. Includes a test to verify the integration is working properly.
Spark: support latest versions 3.4.3 and 3.5.1. #2743 @pawel-big-lebowski
Upgrades CI workflows to run tests against latest Spark versions: 3.4.2 -> 3.4.3 and 3.5.0 -> 3.5.1.
Spark: add extraction of the masking property in column-level lineage #2789 @tnazarew
Adds extraction of the masking property during collection of dependencies for ColumnLineageDatasetFacet creation.
Spark: collect table name from InsertIntoHadoopFsRelationCommand #2794 @dolfinus
Collects a table name for INSERT INTO command for tables created with USING $fileFormat syntax, like USING orc.
Spark, Flink: add PostgresJdbcExtractor #2806 @dolfinus
Adds the default 5432 port to Postgres namespaces.
Spark, Flink: add TeradataJdbcExtractor #2826 @dolfinus
Converts JDBC URLs like jdbc:teradata/host/DBS_PORT=1024,DATABASE=somedb to datasets with namespace teradata://host:1024 and name somedb.table.
Spark, Flink: add MySqlJdbcExtractor #2825 @dolfinus
Handles different formats of MySQL JDBC URL, and produces datasets with consistent namespaces, like mysql://host:port.
Spark, Flink: add OracleJdbcExtractor #2824 @dolfinus
Handles simple Oracle JDBC URLs, like oracle:thin:@//host:port/serviceName and oracle:thin@host:port:sid, and converts each to a dataset with namespace oracle://host:port and name sid.schema.table or serviceName.schema.table.
Spark: configurable test with Docker image provided #2822 @pawel-big-lebowski
Extends the configurable integration test feature to enable getting the Docker image name as a name.
Spark: Support Iceberg 1.4 on Spark 3.5.1. #2838 @pawel-big-lebowski
Include Iceberg support for Spark 3.5. Fix column level lineage facet for UNION queries.
Spec: add example for change in #2756 #2801 @Sheeri
Updates the customLineage facet test for the new syntax created in #2756.

Changed

Spark: fallback to spark.sql.warehouse.dir as table namespace #2767 @dolfinus
In cases when a metastore is not used, falls back to spark.sql.warehouse.dir or hive.metastore.warehouse.dir as table namespace, instead of duplicating the table's location.

Fixed

Java: handle dashes in hostname for JdbcExtractors #2830 @dolfinus
Proper handling of dashes in JDBC URL hosts.
Spark: fix Glue symlinks formatting bug #2807 @Akash2351
Fixes Glue symlinks with config parsing for Glue catalogid.
Spark, Flink: fix DBFS namespace format #2800 @dolfinus
Fixes the DBFS namespace format.
Spark: fix Glue naming format #2766 @dolfinus
Changes the AWS Glue namespace to match Glue ARN documentation.
Spark: fix Iceberg dataset location #2797 @dolfinus
Fixes Iceberg dataset namespace: instead of file:/some/path/database.table uses file:/some/path/database/table. For dataset TABLE symlink, uses warehouse location instead of database location.
Spark: fix NPE and incorrect comment #2827 @pawel-big-lebowski
Fixes an error caused by a recent upgrade of Spark versions that did not break existing tests.
Spark: convert scheme and authority to lowercase in JdbcLocation #2831 @dolfinus
Converts valid JDBC URL scheme and authority to lowercase, leaving intact instance/database name, as different databases have different default case and case-sensitivity rules.

Added​

Changed​

Fixed​

Added

Changed

Fixed