Skip to main content

1.5.0 - 2023-11-02

Added

  • Flink: add Flink lineage for Cassandra Connectors #2175 @HuangZhenQiu
    Adds Flink Cassandra source and sink visitors and Flink Cassandra Integration test.
  • Spark: support rdd and toDF operations available in Spark Scala API #2188 @pawel-big-lebowski
    Includes the first Scala integration test, fixes ExternalRddVisitor and adds support for extracting inputs from MapPartitionsRDD and ParallelCollectionRDD plan nodes.
  • Spark: support Databricks Runtime 13.3 #2185 @pawel-big-lebowski
    Modifies the Spark integration to support the latest Databricks Runtime version.

Changed

  • Airflow: loosen attrs and requests versions #2107 @JDarDagran
    Lowers the version requirements for attrs and requests and removes an unnecessary dependency.
  • dbt: render yaml configs lazily #2221 @JDarDagran
    Don't render each entry in yaml files at start.

Fixed

  • Airflow/Athena: change dataset name to its location #2167 @sophiely
    Replaces the dataset and namespace with the data's physical location for more complete lineage across integrations.
  • Python client: skip redaction in column lineage facet #2177 @JDarDagran
    Redacted fields in ColumnLineageDatasetFacetFieldsAdditionalInputFields are now skipped.
  • Spark: unify dataset naming for RDD jobs and Spark SQL #2181 @pawel-big-lebowski
    Use the same mechanism for RDD jobs to extract dataset identifier as used for Spark SQL.
  • Spark: ensure a single START and a single COMPLETE event are sent #2103 @pawel-big-lebowski
    For Spark SQL at least four events are sent triggered by different SparkListener methods. Each of them is required and used to collect facets unavailable elsewhere. However, there should be only one START and COMPLETE events emitted. Other events should be sent as RUNNING. Please keep in mind that Spark integration remains stateless to limit the memory footprint, and it is the backend responsibility to merge several Openlineage events into a meaningful snapshot of metadata changes.