1.8.0 - 2024-01-22
Added
- Flink: support Flink 1.18 #2366@HuangZhenQiu
 Adds support for the latest Flink version with 1.17 used for Iceberg Flink runtime and Cassandra Connector as these do not yet support 1.18.
- Spark: add Gradle plugins to simplify the build process to support Scala 2.13 #2376@d-m-h
 *Defines a set of Gradle plugins to configure the modules and reduce duplication.
- Spark: support multiple Scala versions LogicalPlanimplementation#2361@mattiabertorello
 In the LogicalPlanSerializerTest class, the implementation of the LogicalPlan interface is different between Scala 2.12 and Scala 2.13. In detail, the IndexedSeq changes package from the scala.collection to scala.collection.immutable. This implements both of the methods necessary in the two versions.
- Spark: Use ScalaConversionUtils to convert Scala and Java collections #2357@mattiabertorello
 This initial step is to start supporting compilation for Scala 2.13 in the 3.2+ Spark versions. Scala 2.13 changed the default collection to immutable, the methods to create an empty collection, and the conversion between Java and Scala. This causes the code to not compile between 2.12 and 2.13. This replaces the usage of direct Scala collection methods (like creating an empty object) and conversions utils withScalaConversionUtilsmethods that will support cross-compilation.
- Spark: support MERGE INTOqueries on Databricks#2348@pawel-big-lebowski
 Supports custom plan nodes used when runningMERGE INTOqueries on Databricks runtime.
- Spark: Support Glue catalog in iceberg #2283@nataliezeller1
 Adds support for the Glue catalog based on the 'catalog-impl' property (in this case we will not have a 'type' property).
Changed
- Spark: Move Spark 3.1 code from the spark3 project #2365@mattiabertorello
 Moves the Spark 3.1-related code to a specific project, spark31, so the spark3 project can be compiled with any Spark 3.x version.
Fixed
- Airflow: add database information to SnowflakeExtractor #2364@kacpermuda
 Fixes missing database information in SnowflakeExtractor.
- Airflow: add dag_id to task_run_id to avoid duplicates #2358@kacpermuda
 The lack of dag_id in task_run_id can cause duplicates in run_id across different dags.
- Airflow: Add tests for column lineage facet and sql parser #2373@kacpermuda
 Improves naming (database.schema.table) in SQLExtractor's column lineage facet and adds some unit tests.
- Spark: fix removePathPattern behaviour #2350@pawel-big-lebowski
 The removepath pattern feature is not applied all the time. The method is called when constructing DatasetIdentifier through PathUtils which is not the case all the time. This moves removePattern to another place in the codebase that is always run.
- Spark: fix a type incompatibility in RddExecutionContext between Scala 2.12 and 2.13 #2360@mattiabertorello
 The function from the ResultStage.func() object change type in Spark between Scala 2.12 and 2.13 makes the compilation fail. This avoids getting the function with an explicit type; instead, it gets it every time it is needed from the ResultStage object. This PR is part of the effort to support Scala 2.13 in the Spark integration.
- Spark: Fix removePathPatternfeature#2350@pawel-big-lebowski
 Refactors code to make sure that all datasets sent are processed throughremovePathPatternif configured to do so.
- Spark: Clean up the individual build.gradle files in preparation for Scala 2.13 support #2377@d-m-h
 Cleans up the build.gradle files, consolidating the custom plugin and removing unused and unnecessary configuration.
- Spark: refactor the Gradle plugins to make it easier to define Scala variants per module #2383@d-m-h
 The third of several PRs to support producing Scala 2.12 and Scala 2.13 variants of the OpenLineage Spark integration. This PR refactors the custom Gradle plugins in order to make supporting multiple variants per module easier. This is necessary because the shared module fails its tests when consuming the Scala 2.13 variants of Apache Spark.