We are pleased to announce the initial release of OpenLineage. This is the culmination of a broad community effort, and establishes a common framework for data lineage collection and analysis.
We want to thank all the contributors as well all the projects and companies involved in the design (in alphabetical order): Airflow, Astronomer, Datakin, Data Mesh, dbt, Egeria, GetInData, Great Expectations, Iceberg (and others that I am probably forgetting).
This release includes:
- The initial 1-0-0 release of the OpenLineage specification
- A core lineage model of Jobs, Runs and Datasets
- Core facets
- Data Quality Metrics and statistics
- Dataset schema
- Source code location
- Clients that send OpenLineage events to an HTTP backend
- Integrations that collect lineage metadata as OpenLineage events
- Apache Airflow with support for BigQuery, Great Expectations, Postgres, Redshift, Snowflake
- Apache Spark
This is only the beginning. We invite everyone interested to consult and contribute to the roadmap. The roadmap currently contains, among other things: adding support for Kafka, BI dashboards, and column level lineage...but you can influence it by participating!