Skip to main content

Developing With OpenLineage

As there are hundreds and possibly thousands databases, query engines and other tools you could use to process, create and move data, there's great chance that existing OpenLineage integration won't cover your needs.

However, OpenLineage project also provides libraries you could use to write your own integration.

Clients

For Python and Java, we've created clients that you can use to properly create and emit OpenLineage events to HTTP, Kafka, and other consumers.

API Documentation

Common Library (Python)

Getting lineage from systems like BigQuery or Redshift isn't necessarily tied to orchestrator or processing engine you're using. For this reason, we've extracted that functionality from our Airflow library and packaged it for separate use.

Environment Variables

The following environment variables are available commonly for both Java and Python languages.

NameDescriptionSince
OPENLINEAGE_API_KEYThe optional API key to be set on each lineage request. This will be set as a Bearer token in case authentication is required.
OPENLINEAGE_CONFIGThe optional path to locate the configuration file. The configuration file is in YAML format. Example: openlineage.yml
OPENLINEAGE_DISABLEDWhen set to true, will prevent OpenLineage from emitting events to the receiving backend0.9.0
OPENLINEAGE_URLThe URL for the HTTP transport of where to emit lineage events to. If not yet, no lineage data will be emitted, and event data (JSON) will be written to standard output. Example: http://localhost:8080

SQL parser

We've created SQL parser that allows you to extract lineage from SQL statements. The parser is implemented in Rust, however, it's also available as a Python library. You can take a look at it's documentation here or code on GitHub.