Reusable actions and common scripts
Reusable actions
Run Event Validation
The run_event_validation
action is a custom GitHub action that handles validation logic for OpenLineage events. Because OpenLineage events have a standardized structure, we provide a generic action that validates events against OpenLineage specifications.
The action:
- Retrieves the OpenLineage specification for all releases defined in
release_tags
- Runs syntax validation (checks if events conform to the OpenLineage JSON schema)
- Runs semantic validation (compares actual event content with expected values using Event Comparison)
- Creates a comprehensive report using Report
Inputs:
Name | Description | Required | Default |
---|---|---|---|
release_tags | List of the spec versions to check against | false | "" |
ol_release | Release to run the validation with | false | "" |
component_release | Release of the component producing events | false | "" |
target-path | Path to save the report to | true | - |
event-directory | Directory containing the events to validate | true | - |
producer-dir | Directory with producer definitions | true | - |
component | Component name to use | true | - |
Outputs:
Name | Description |
---|---|
report_path | Path to generated report |
Structure
The action requires a specific directory structure for validation to work properly:
Event Directory Structure:
- Root event directory - Top-level directory containing scenario subdirectories
- Scenario subdirectories - One directory per test scenario
- Generated event files - Actual OpenLineage events produced by the component being tested
- File naming - Events should be named descriptively (e.g.,
job_start.json
,job_complete.json
) - Format - All files must be valid JSON containing OpenLineage events
- Scenario subdirectories - One directory per test scenario
Producer Directory Structure:
- Producer root - Main directory for the producer component
- Scenarios directory - Contains expected event definitions
- Scenario subdirectories - Mirror the structure of event directory
config.json
- Configuration file with test specifications and version constraintsevents/
- Directory containing expected OpenLineage event templates- Expected event files - Template events using Jinja functions for flexible validation
maintainers.json
- File listing scenario maintainersscenario.md
- Documentation describing the test scenario
- Scenario subdirectories - Mirror the structure of event directory
- Scenarios directory - Contains expected event definitions
Example Directory Layout:
event-directory/
├── scenario1/
│ ├── job_start.json # Generated events
│ └── job_complete.json
└── scenario2/
├── spark_read.json
└── spark_write.json
producer-dir/
├── scenarios/
│ ├── scenario1/
│ │ ├── config.json # Test configuration
│ │ ├── events/
│ │ │ ├── job_start.json # Expected event template
│ │ │ └── job_complete.json
│ │ ├── maintainers.json
│ │ └── scenario.md
│ └── scenario2/
│ ├── config.json
│ ├── events/
│ │ ├── spark_read.json
│ │ └── spark_write.json
│ ├── maintainers.json
│ └── scenario.md
Validation Process:
- Discovery - Action scans event directory for scenario subdirectories
- Matching - For each scenario, finds corresponding producer scenario definition
- Configuration Loading - Reads scenario config.json for version constraints and test specifications
- Event Pairing - Matches generated events with expected event templates by filename
- Validation Execution - Runs comparison between generated and expected events
- Report Generation - Compiles results into comprehensive compatibility report
Get OpenLineage Artifacts
Action that downloads OpenLineage artifacts from either the latest OpenLineage builds or Maven repository. If get-latest-snapshots
is true, the action attempts to get each non-skipped artifact from the latest build. If that fails, it falls back to getting the artifact from Maven Central using the version specified in version
.
Inputs:
Name | Description | Required | Default |
---|---|---|---|
get-latest-snapshots | First try to download artifacts from OpenLineage builds, rather than Maven repository | false | false |
version | OpenLineage artifact version to use if get-latest-snapshots is false or artifact is unavailable in latest build artifacts | true | |
skip-spark | Skip Spark integration download | false | false |
skip-java | Skip Java client download | false | false |
skip-flink | Skip Flink integration download | false | false |
skip-sql | Skip SQL interface download | false | false |
skip-extensions | Skip extensions download | false | false |
skip-gcp-lineage | Skip GCP-lineage transport download | false | false |
skip-gcs | Skip GCS transport download | false | false |
skip-s3 | Skip S3 transport download | false | false |
Outputs:
Name | Description |
---|---|
spark | File path of the downloaded openlineage-spark jar |
java | File path of the downloaded openlineage-java jar |
flink | File path of the downloaded openlineage-flink jar |
sql | File path of the downloaded openlineage-sql-java jar |
extensions | File path of the downloaded openlineage-extensions jar |
gcp-lineage | File path of the downloaded transports-gcp-lineage jar |
gcs | File path of the downloaded transports-gcs jar |
s3 | File path of the downloaded transports-s3 jar |
Common scripts
Event Comparison
Events are compared using the compare_events.py
script, which iterates through the expected JSON and for each defined field checks if there is a corresponding one in the result file. Helper Jinja functions are defined to improve test coverage.
Value functions are used in example events to substitute exact values:
any
- If the key has any value definedis_datetime
- Field value is a parsable datetimeis_uuid
- Field value is a UUIDcontains
- Field value contains the exact stringmatch
- Field value matches the given regexnot_match
- Field value doesn't match the given regexone_of
- Field value is one of the given values
key functions
key_not_defined
- key is not definedunordered_list
- for every element of expected array it checks if any of the elements in result array matches instead of comparing elements on the same indexes
Event structure
Example structure of expected json
Structure of example json
{
"eventTime": "{{ is_datetime(result) }}",
"eventType": "{{ one_of(result, 'RUNNING', 'COMPLETE') }}",
"run": {
"runId": "{{ is_uuid(result) }}",
"facets": {
"{{ key_not_defined(result, 'parent') }}": {}
}
},
"job": {
"namespace": "Example Namespace",
"name": "Example Name"
},
"outputs": [
{
"namespace": "hdfs://dataproc-producer-test-m",
"name": "/user/hive/warehouse/t2",
"facets": {
"columnLineage": {
"fields": {
"a": {
"inputFields": [
{
"namespace": "hdfs://dataproc-producer-test-m",
"name": "/user/hive/warehouse/t1",
"field": "a",
"{{ unordered_list(result, transformations) }} ": [
{
"type": "DIRECT",
"subtype": "TRANSFORMATION"
},
{
"type": "INDIRECT",
"subtype": "CONDITIONAL"
}
]
},
{
"namespace": "hdfs://dataproc-producer-test-m",
"name": "/user/hive/warehouse/t1",
"field": "a"
}
]
}
}
}
}
}
]
}
Report
The scripts/report.py
provides a structured representation of test report using Python classes:
The classes provide an api to:
- add components, scenarios and tests to the report
- serialize/deserialize the report to json
- create summaries for both producer and consumer
- update the report with values from new report
- create new failures report by searching for sa asd asd asd asd as failures in new report but absent in old report