This page is about Airflow's external integration that works mainly for Airflow versions <2.7.
If you're using Airflow 2.7+, look at native Airflow OpenLineage provider documentation.
The ongoing development and enhancements will be focused on the
openlineage-airflow will primarily be updated for bug fixes. See all Airflow versions supported by this integration
Before installing check supported Airflow versions.
To download and install the latest
openlineage-airflow library run:
You can also add
openlineage-airflow to your
requirements.txt for Airflow.
To install from source, run:
$ python3 setup.py install
Next, specify where you want OpenLineage to send events.
We recommend configuring the client with an
openlineage.yml file that tells the client how to connect to an OpenLineage backend.
See how to do it.
The simplest option, limited to HTTP client, is to use the environment variables. For example, to send OpenLineage events to a local instance of Marquez, use:
OPENLINEAGE_ENDPOINT=api/v1/lineage # This is the default value when this variable is not set, it can be omitted in this example
OPENLINEAGE_API_KEY=secret_token # This is only required if authentication headers are required, it can be omitted in this example
To set up an additional configuration, or to send events to targets other than an HTTP server (e.g., a Kafka topic), configure a client.
NOTE: If you use a version of Airflow older than 2.3.0, additional configuration is required.
The following environment variables are available specifically for the Airflow integration, in addition to Python client variables.
False if you want source code of callables provided in PythonOperator or BashOperator
NOT to be included in OpenLineage events.
|The optional list of extractors class (as semi-colon separated string) in case you need to use custom extractors.
|The optional namespace that the lineage data belongs to. If not specified, defaults to
|Logging level of OpenLineage client in Airflow (the OPENLINEAGE_CLIENT_LOGGING variable from python client has no effect here).
For backwards compatibility,
openlineage-airflow also supports configuration via
MARQUEZ_API_KEY variables, instead of standard
Variables with different prefix should not be mixed together.
When enabled, the integration will:
- On TaskInstance start, collect metadata for each task.
- Collect task input / output metadata (source, schema, etc.).
- Collect task run-level metadata (execution time, state, parameters, etc.)
- On TaskInstance complete, also mark the task as complete in Marquez.