1.19.0 - 2024-07-22
Added
- Airflow: add
log_url
toAirflowRunFacet
#2852
@dolfinus
Adds taskinstance'slog_url
field toAirflowRunFacet
. - Spark: add handling for
Generate
#2856
@tnazarew
Adds handling forGenerate
-type nodes of a logical plan (e.g., explode operations). - Java: add
DerbyJdbcExtractor
#2869
@dolfinus
AddsJdbcExtractor
implementation for Derby database. As this is a file-based DBMS, its Dataset namespace isfile
and name is an absolute path to a database file. - Spark: verify bytecode version of the built jar.
#2859
@pawel-big-lebowski
Extends theJarVerifier
plugin to ensure all compiled classes have a bytecode version of Java 8 or lower. - Spark: add Kafka streaming source support
#2851
@d-m-h @imbruced
Adds support for Kafka streaming sources to Kafka streaming sinks. Inputs and outputs are now included in lineage events.
Fixed
- Airflow: replace datetime.now with airflow.utils.timezone.utcnow
#2865
@kacpermuda
Fixes missing timezone information in task FAIL events. - Spark: remove shaded dependency in
ColumnLevelLineageBuilder
#2850
@tnazarew
Removes the shadedStreams
dependency inColumnLevelLineageBuilder
causing aClassNotFoundException
. - Spark: make Delta dataset symlink consistent with non-Delta tables
#2863
@dolfinus
Makes dataset symlinks for Delta and non-Delta tables consistent. - Spark: use Table's properties during column-level lineage construction
#2855
@ddebowczyk92
FixesPlanUtils3
so Dataset identifier information based on a Table's properties is also retrieved during the construction of column-level lineage. - Spark: extract job name creation to providers
#2861
@arturowczarek
The integration now detects if thespark.app.name
was autogenerated by Glue and uses the Glue job name in such cases. Also, each job name provisioning strategy is now extracted to a separate provider.