Skip to main content
Version: 1.22.0

1.19.0 - 2024-07-22

Added

  • Airflow: add log_url to AirflowRunFacet #2852 @dolfinus
    Adds taskinstance's log_url field to AirflowRunFacet.
  • Spark: add handling for Generate #2856 @tnazarew
    Adds handling for Generate-type nodes of a logical plan (e.g., explode operations).
  • Java: add DerbyJdbcExtractor #2869 @dolfinus
    Adds JdbcExtractor implementation for Derby database. As this is a file-based DBMS, its Dataset namespace is file and name is an absolute path to a database file.
  • Spark: verify bytecode version of the built jar. #2859 @pawel-big-lebowski
    Extends the JarVerifier plugin to ensure all compiled classes have a bytecode version of Java 8 or lower.
  • Spark: add Kafka streaming source support #2851 @d-m-h @imbruced
    Adds support for Kafka streaming sources to Kafka streaming sinks. Inputs and outputs are now included in lineage events.

Fixed

  • Airflow: replace datetime.now with airflow.utils.timezone.utcnow #2865 @kacpermuda
    Fixes missing timezone information in task FAIL events.
  • Spark: remove shaded dependency in ColumnLevelLineageBuilder #2850 @tnazarew
    Removes the shaded Streams dependency in ColumnLevelLineageBuilder causing a ClassNotFoundException.
  • Spark: make Delta dataset symlink consistent with non-Delta tables #2863 @dolfinus
    Makes dataset symlinks for Delta and non-Delta tables consistent.
  • Spark: use Table's properties during column-level lineage construction #2855 @ddebowczyk92
    Fixes PlanUtils3 so Dataset identifier information based on a Table's properties is also retrieved during the construction of column-level lineage.
  • Spark: extract job name creation to providers #2861 @arturowczarek
    The integration now detects if the spark.app.name was autogenerated by Glue and uses the Glue job name in such cases. Also, each job name provisioning strategy is now extracted to a separate provider.