Class DatasetReducer
Two datasets can be reduced if they have a common trimmed name and same facets. A logic used
to trim dataset name is defined in the DatasetConfig
via
collection of dataset trimmers. A default collection of trimmers can be altered with
extraTrimmers or disabledTrimmers settings.
A reduce operation returns a single dataset with a trimmed name of the reduced datasets and all the facets of the reduced datasets. Additionally, a returned dataset is enriched with a subset definition facet containing non-trimmed dataset names of all the datasets that were reduced.
Reduce on a single dataset, with dataset name that can't be trimmed, results in an unmodified dataset. Reduce on a single dataset, with dataset name that can be trimmed, returns a dataset with a trimmed name and locations' based subset definition facet with a non-trimmed name of a dataset.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionreduceInputs
(List<OpenLineage.InputDataset> datasets) Given a list of input datasets, returns a new list of input datasets after applying partition detection and merging rules.reduceOutputs
(List<OpenLineage.OutputDataset> datasets) Given a list of output datasets, returns a new list of output datasets after applying partition detection and merging rules.
-
Constructor Details
-
DatasetReducer
-
-
Method Details
-
reduceInputs
Given a list of input datasets, returns a new list of input datasets after applying partition detection and merging rules.- Parameters:
datasets
- list of input datasets- Returns:
- list of reduced input datasets
-
reduceOutputs
Given a list of output datasets, returns a new list of output datasets after applying partition detection and merging rules.- Parameters:
datasets
- list of output datasets- Returns:
- list of reduced output datasets
-