1.18.0 - 2024-07-11
Added
- Spark: configurable integration test
#2755@pawel-big-lebowski
Provides command line tool capable of running Spark integration tests that can be created without Java. - Spark: OpenLineage Spark extension interfaces without runtime dependency hell
#2809#2837@ddebowczyk92
New Spark extension interfaces without runtime dependency hell. Includes a test to verify the integration is working properly. - Spark: support latest versions 3.4.3 and 3.5.1.
#2743@pawel-big-lebowski
Upgrades CI workflows to run tests against latest Spark versions: 3.4.2 -> 3.4.3 and 3.5.0 -> 3.5.1. - Spark: add extraction of the masking property in column-level lineage
#2789@tnazarew
Adds extraction of the masking property during collection of dependencies forColumnLineageDatasetFacetcreation. - Spark: collect table name from
InsertIntoHadoopFsRelationCommand#2794@dolfinus
Collects a table name forINSERT INTOcommand for tables created withUSING $fileFormatsyntax, likeUSING orc. - Spark, Flink: add
PostgresJdbcExtractor#2806@dolfinus
Adds the default5432port to Postgres namespaces. - Spark, Flink: add
TeradataJdbcExtractor#2826@dolfinus
Converts JDBC URLs likejdbc:teradata/host/DBS_PORT=1024,DATABASE=somedbto datasets with namespaceteradata://host:1024and namesomedb.table. - Spark, Flink: add
MySqlJdbcExtractor#2825@dolfinus
Handles different formats of MySQL JDBC URL, and produces datasets with consistent namespaces, likemysql://host:port. - Spark, Flink: add
OracleJdbcExtractor#2824@dolfinus
Handles simple Oracle JDBC URLs, likeoracle:thin:@//host:port/serviceNameandoracle:thin@host:port:sid, and converts each to a dataset with namespaceoracle://host:portand namesid.schema.tableorserviceName.schema.table. - Spark: configurable test with Docker image provided
#2822@pawel-big-lebowski
Extends the configurable integration test feature to enable getting the Docker image name as a name. - Spark: Support Iceberg 1.4 on Spark 3.5.1.
#2838@pawel-big-lebowski
Include Iceberg support for Spark 3.5. Fix column level lineage facet forUNIONqueries. - Spec: add example for change in
#2756#2801@Sheeri
Updates thecustomLineagefacet test for the new syntax created in#2756.
Changed
- Spark: fallback to
spark.sql.warehouse.diras table namespace#2767@dolfinus
In cases when a metastore is not used, falls back tospark.sql.warehouse.dirorhive.metastore.warehouse.diras table namespace, instead of duplicating the table's location.
Fixed
- Java: handle dashes in hostname for
JdbcExtractors#2830@dolfinus
Proper handling of dashes in JDBC URL hosts. - Spark: fix Glue symlinks formatting bug
#2807@Akash2351
Fixes Glue symlinks with config parsing for Gluecatalogid. - Spark, Flink: fix DBFS namespace format
#2800@dolfinus
Fixes the DBFS namespace format. - Spark: fix Glue naming format
#2766@dolfinus
Changes the AWS Glue namespace to match Glue ARN documentation. - Spark: fix Iceberg dataset location
#2797@dolfinus
Fixes Iceberg dataset namespace: instead offile:/some/path/database.tableusesfile:/some/path/database/table. For dataset TABLE symlink, uses warehouse location instead of database location. - Spark: fix NPE and incorrect comment
#2827@pawel-big-lebowski
Fixes an error caused by a recent upgrade of Spark versions that did not break existing tests. - Spark: convert scheme and authority to lowercase in
JdbcLocation#2831@dolfinus
Converts valid JDBC URL scheme and authority to lowercase, leaving intact instance/database name, as different databases have different default case and case-sensitivity rules.