Developing With OpenLineage
As there are hundreds and possibly thousands databases, query engines and other tools you could use to process, create and move data, there's great chance that existing OpenLineage integration won't cover your needs.
However, OpenLineage project also provides libraries you could use to write your own integration.
Clients
For Python and Java, we've created clients that you can use to properly create and emit OpenLineage events to HTTP, Kafka, and other consumers.
API Documentation
Common Library (Python)
Getting lineage from systems like BigQuery or Redshift isn't necessarily tied to orchestrator or processing engine you're using. For this reason, we've extracted that functionality from our Airflow library and packaged it for separate use.
Environment Variables
The list of available environment variables for Python can be found here. The list of available environment variables for Java can be found here.
SQL parser
We've created SQL parser that allows you to extract lineage from SQL statements. The parser is implemented in Rust; however, it's also available as a Python library. You can take a look at its code on GitHub.
Contributing
If contributing changes, additions or fixes, please include the following header in any new files:
/*
/* Copyright 2018-2024 contributors to the OpenLineage project
/* SPDX-License-Identifier: Apache-2.0
*/
There is a pre-commit step that checks license in headers for new files when pull requests are opened.
Thanks for your contributions to the project!