1.25.0 - 2024-12-03
Added
- Dbt: Add support for Column-Level Lineage in dbt integration.
#3264
@mayurmadnani Dbt integration now uses SQL parser to add information about collected column-level lineage. - Spark: Add input and output statistics about datasets read and written.
#3240
#3263
@pawel-big-lebowski Fix issues related to existing output statistics collection mechanism and fetch input statistics. Output statistics contain now amount of files written, bytes size as well as records written. Input statistics contain bytes size and number of files read, while record count is collected only for DataSourceV2 sources. - Introduced InputStatisticsInputDatasetFacet
#3238
@pawel-big-lebowski
Extend spec with a new facet InputStatisticsInputDatasetFacet modelled after a similar OutputStatisticsOutputDatasetFacet to contain statistics about input dataset read by a job.
Changed
- Spark: Exclude META-INF/*TransportBuilder from Spark Extension Interfaces
#3244
@tnazarew Excludes META-INF/*TransportBuilder to avoid version conflicts. - Spark: enables building input/output facets through
DatasetFactory
#3207
@pawel-big-lebowski
Adds extra capabilities intoDatasetFactory
class, marks some public developers' API methods as deprecated.