Structure
Producer
Contains files and directories related to a specific producer. Each producer should contain:
runnerdirectory containing files necessary to run testsscenariosdirectory containing scenario directoriesmaintainers.jsonfile with the list of people to notify in case of component failuresversions.jsonfile with supported OpenLineage and component versions
producer catalog structure
producer
└── example_producer
├── maintainers.json
├── versions.json
├── runner
│ └── ...
└── scenarios
├── ...
└── example_scenario
├── config.json
├── events
│ ├── ...
│ └── expected_event_structure_1.json
├── maintainers.json
├── scenario.md
└── test
└── scenario_test_script
Runner
Contains any scripts or resources necessary to run the producer tests.
Scenarios
The scenarios directory contains one or more subdirectories, each containing files related to a particular test scenario:
config.jsonfile with the scenario configurationscenario.mdfile with description of the scenariomaintainers.jsonfile with the list of people responsible for the scenario
Config
Each config file contains metadata for the tests in the scenario. There are three types of metadata:
-
Scenario scope config
- Scenario version filters: We may want to test many versions of the producer against many versions of OpenLineage, but not every test scenario needs to run for every version. These filters allow us to define minimum and maximum versions of OpenLineage or producer for which we want to run the scenario.
-
Test scope configs
- name: Name of the test
- path: Path to expected event this test will use
- test version filters: Define minimum and maximum versions of OpenLineage or producer. Semantic tests for filtered out tests will be skipped.
-
Test tags: They will be present in the report and reflected in compatibility tables
- facets: List of facets that the test checks
- lineage level: Indicates dataset lineage level
dataset→ No column level lineage availablecolumn→ Column level lineage availabletransformation→ Transformation info available
Example config
{
"component_versions": {
"min": "0.0.1",
"max": "9.99.9"
},
"openlineage_versions": {
"min": "0.0.1",
"max": "9.99.9"
},
"tests": [
{
"name": "name",
"path": "path/to/file.json",
"component_versions": {
"min": "0.0.1",
"max": "9.99.9"
},
"openlineage_versions": {
"min": "0.0.1",
"max": "9.99.9"
},
"tags": {
"facets": [
"list",
"of",
"supported",
"facets"
],
"lineage_level": {
"bigquery": [
"dataset",
"column",
"transformation"
]
}
}
}
]
}
Events
Directory contains expected events in the form of JSON files. More information on setting up the events for validation can be found in Event validation.
Consumer
Consumer directory contains two subdirectories for:
consumers- with list of consumers and their test scenariosscenarios- scenario input events that are used in test, the directory is in separate location from the consumer definitions so the events can be used by multiple consumers for testing
Each directory in scenarios has following content:
events- directory containing openlineage events to use in consumer testsmaintainers.json- file with the list of people responsible for the scenario eventsscenario.md- human-readable description of the scenario events (producer type, inputs, outputs, facets, executed operations)
Each directory represents a consumer and contains:
validator- directory with the validation logic (unlike producers where produced Openlineage events can be validated by generic component)mapping.json- file with the mapping between Openlineage events and consumer API entitiesmaintainers.json- file with the list of people responsible for the componentscenarios- directory containing scenario directories with following structure:config.json- file with the scenario configurationscenario.md- human-readable description of the scenario (expected change in consumer state)maintainers.json- file with the list of people responsible for the scenariovalidation- directory with expected state of consumer API to validate against
consumer catalog structure
consumer
├── consumers
│ └── <consumer name>
│ ├── README.md
│ ├── maintainers.json
│ ├── mapping.json
│ ├── run_dataplex_tests.sh
│ ├── scenarios
│ │ ├── ...
│ │ └── <scenario name>
│ │ └── api_state
│ │ ├── config.json
│ │ ├── maintainers.json
│ │ ├── scenario.md
│ │ └── validation
│ │ ├── ...
│ │ └── validation_file
│ └── validator