Class DatasetReducer

java.lang.Object
io.openlineage.client.dataset.partition.DatasetReducer

public class DatasetReducer extends Object
Class responsible for reducing datasets.

Two datasets can be reduced if they have a common trimmed name and same facets. A logic used to trim dataset name is defined in the DatasetConfig via collection of dataset trimmers. A default collection of trimmers can be altered with extraTrimmers or disabledTrimmers settings.

A reduce operation returns a single dataset with a trimmed name of the reduced datasets and all the facets of the reduced datasets. Additionally, a returned dataset is enriched with a subset definition facet containing non-trimmed dataset names of all the datasets that were reduced.

Reduce on a single dataset, with dataset name that can't be trimmed, results in an unmodified dataset. Reduce on a single dataset, with dataset name that can be trimmed, returns a dataset with a trimmed name and locations' based subset definition facet with a non-trimmed name of a dataset.

  • Constructor Details

  • Method Details

    • reduceInputs

      public List<OpenLineage.InputDataset> reduceInputs(List<OpenLineage.InputDataset> datasets)
      Given a list of input datasets, returns a new list of input datasets after applying partition detection and merging rules.
      Parameters:
      datasets - list of input datasets
      Returns:
      list of reduced input datasets
    • reduceOutputs

      public List<OpenLineage.OutputDataset> reduceOutputs(List<OpenLineage.OutputDataset> datasets)
      Given a list of output datasets, returns a new list of output datasets after applying partition detection and merging rules.
      Parameters:
      datasets - list of output datasets
      Returns:
      list of reduced output datasets