1.5.0 - 2023-11-02
Added
- Flink: add Flink lineage for Cassandra Connectors #2175@HuangZhenQiu
 Adds Flink Cassandra source and sink visitors and Flink Cassandra Integration test.
- Spark: support rddandtoDFoperations available in Spark Scala API#2188@pawel-big-lebowski
 Includes the first Scala integration test, fixesExternalRddVisitorand adds support for extracting inputs fromMapPartitionsRDDandParallelCollectionRDDplan nodes.
- Spark: support Databricks Runtime 13.3 #2185@pawel-big-lebowski
 Modifies the Spark integration to support the latest Databricks Runtime version.
Changed
- Airflow: loosen attrs and requests versions #2107@JDarDagran
 Lowers the version requirements for attrs and requests and removes an unnecessary dependency.
- dbt: render yaml configs lazily #2221@JDarDagran
 Don't render each entry in yaml files at start.
Fixed
- Airflow/Athena: change dataset name to its location #2167@sophiely
 Replaces the dataset and namespace with the data's physical location for more complete lineage across integrations.
- Python client: skip redaction in column lineage facet #2177@JDarDagran
 Redacted fields inColumnLineageDatasetFacetFieldsAdditionalInputFieldsare now skipped.
- Spark: unify dataset naming for RDD jobs and Spark SQL #2181@pawel-big-lebowski
 Use the same mechanism for RDD jobs to extract dataset identifier as used for Spark SQL.
- Spark: ensure a single STARTand a singleCOMPLETEevent are sent#2103@pawel-big-lebowski
 For Spark SQL at least four events are sent triggered by different SparkListener methods. Each of them is required and used to collect facets unavailable elsewhere. However, there should be only oneSTARTandCOMPLETEevents emitted. Other events should be sent asRUNNING. Please keep in mind that Spark integration remains stateless to limit the memory footprint, and it is the backend responsibility to merge several Openlineage events into a meaningful snapshot of metadata changes.