1.5.0 - 2023-11-02
- Flink: add Flink lineage for Cassandra Connectors
Adds Flink Cassandra source and sink visitors and Flink Cassandra Integration test.
- Spark: support
toDFoperations available in Spark Scala API
Includes the first Scala integration test, fixes
ExternalRddVisitorand adds support for extracting inputs from
- Spark: support Databricks Runtime 13.3
Modifies the Spark integration to support the latest Databricks Runtime version.
- Airflow: loosen attrs and requests versions
Lowers the version requirements for attrs and requests and removes an unnecessary dependency.
- dbt: render yaml configs lazily
Don't render each entry in yaml files at start.
- Airflow/Athena: change dataset name to its location
Replaces the dataset and namespace with the data's physical location for more complete lineage across integrations.
- Python client: skip redaction in column lineage facet
Redacted fields in
ColumnLineageDatasetFacetFieldsAdditionalInputFieldsare now skipped.
- Spark: unify dataset naming for RDD jobs and Spark SQL
Use the same mechanism for RDD jobs to extract dataset identifier as used for Spark SQL.
- Spark: ensure a single
STARTand a single
COMPLETEevent are sent
For Spark SQL at least four events are sent triggered by different SparkListener methods. Each of them is required and used to collect facets unavailable elsewhere. However, there should be only one
COMPLETEevents emitted. Other events should be sent as
RUNNING. Please keep in mind that Spark integration remains stateless to limit the memory footprint, and it is the backend responsibility to merge several Openlineage events into a meaningful snapshot of metadata changes.