Flink: add Flink lineage for Cassandra Connectors#2175@HuangZhenQiu Adds Flink Cassandra source and sink visitors and Flink Cassandra Integration test.
Spark: support rdd and toDF operations available in Spark Scala API#2188@pawel-big-lebowski Includes the first Scala integration test, fixes ExternalRddVisitor and adds support for extracting inputs from MapPartitionsRDD and ParallelCollectionRDD plan nodes.
Spark: support Databricks Runtime 13.3#2185@pawel-big-lebowski Modifies the Spark integration to support the latest Databricks Runtime version.
Airflow: loosen attrs and requests versions#2107@JDarDagran Lowers the version requirements for attrs and requests and removes an unnecessary dependency.
dbt: render yaml configs lazily#2221@JDarDagran Don't render each entry in yaml files at start.
Airflow/Athena: change dataset name to its location#2167@sophiely Replaces the dataset and namespace with the data's physical location for more complete lineage across integrations.
Python client: skip redaction in column lineage facet#2177@JDarDagran Redacted fields in ColumnLineageDatasetFacetFieldsAdditionalInputFields are now skipped.
Spark: unify dataset naming for RDD jobs and Spark SQL#2181@pawel-big-lebowski Use the same mechanism for RDD jobs to extract dataset identifier as used for Spark SQL.
Spark: ensure a single START and a single COMPLETE event are sent#2103@pawel-big-lebowski For Spark SQL at least four events are sent triggered by different SparkListener methods. Each of them is required and used to collect facets unavailable elsewhere. However, there should be only one START and COMPLETE events emitted. Other events should be sent as RUNNING. Please keep in mind that Spark integration remains stateless to limit the memory footprint, and it is the backend responsibility to merge several Openlineage events into a meaningful snapshot of metadata changes.