1.19.0 - 2024-07-22
Added
- Airflow: add
log_urltoAirflowRunFacet#2852@dolfinus
Adds taskinstance'slog_urlfield toAirflowRunFacet. - Spark: add handling for
Generate#2856@tnazarew
Adds handling forGenerate-type nodes of a logical plan (e.g., explode operations). - Java: add
DerbyJdbcExtractor#2869@dolfinus
AddsJdbcExtractorimplementation for Derby database. As this is a file-based DBMS, its Dataset namespace isfileand name is an absolute path to a database file. - Spark: verify bytecode version of the built jar.
#2859@pawel-big-lebowski
Extends theJarVerifierplugin to ensure all compiled classes have a bytecode version of Java 8 or lower. - Spark: add Kafka streaming source support
#2851@d-m-h @imbruced
Adds support for Kafka streaming sources to Kafka streaming sinks. Inputs and outputs are now included in lineage events.
Fixed
- Airflow: replace datetime.now with airflow.utils.timezone.utcnow
#2865@kacpermuda
Fixes missing timezone information in task FAIL events. - Spark: remove shaded dependency in
ColumnLevelLineageBuilder#2850@tnazarew
Removes the shadedStreamsdependency inColumnLevelLineageBuildercausing aClassNotFoundException. - Spark: make Delta dataset symlink consistent with non-Delta tables
#2863@dolfinus
Makes dataset symlinks for Delta and non-Delta tables consistent. - Spark: use Table's properties during column-level lineage construction
#2855@ddebowczyk92
FixesPlanUtils3so Dataset identifier information based on a Table's properties is also retrieved during the construction of column-level lineage. - Spark: extract job name creation to providers
#2861@arturowczarek
The integration now detects if thespark.app.namewas autogenerated by Glue and uses the Glue job name in such cases. Also, each job name provisioning strategy is now extracted to a separate provider.