Configuration
Intro
The OpenLineage Python client supports four main configuration sections that control how events are emitted and what metadata is included:
- Transports - Configures how events are sent to OpenLineage backends (HTTP, Kafka, File, Console, etc.)
- Facets - Configures some facets (e.g., which environment variables are attached to events as facet)
- Filters - Defines rules to selectively exclude certain events from being emitted
- Tags - Configures custom tags added to jobs and runs entities as custom facet.
Configuration can be provided in several ways:
Configuration is read only at client creation time; any changes to configuration environment variables or the configuration file made after a client has been created will have no effect.
-
Environment Variables (Recommended) - See the Environment Variables section below.
-
YAML Configuration File - Use an
openlineage.ymlfile that contains all configuration details. The file can be located in three ways:- Set the
OPENLINEAGE_CONFIGenvironment variable to the file path:OPENLINEAGE_CONFIG=path/to/my_config.yml - Place an
openlineage.ymlfile in the current working directory - Place an
openlineage.ymlfile under.openlineage/in the user's home directory (~/.openlineage/openlineage.yml)
- Set the
-
Python Code - Pass configuration directly to the
OpenLineageClientconstructor using theconfigparameter
The configuration precedence is as follows:
- Configuration passed to the client constructor
- YAML config file (if found)
- Environment variables with the
OPENLINEAGE__prefix - Legacy environment variables for HTTP transport
If no configuration is found, ConsoleTransport is used by default, and events are printed to the console.
Environment Variables
All variables (apart from Meta Variables) that affect the client configuration start with the prefix OPENLINEAGE__,
followed by nested keys separated by double underscores (__).
- Prefix Requirement: All environment variables must begin with
OPENLINEAGE__. - Sections Separation: Configuration sections are separated using double underscores
__to form the hierarchy. - Lowercase Conversion: Environment variable values are automatically converted to lowercase.
- JSON String Support: You can pass a JSON string at any level of the configuration hierarchy, which will be merged into the final configuration structure.
- Hyphen Restriction: Since environment variable names cannot contain
-(hyphen), if a name strictly requires a hyphen, use a JSON string as the value of the environment variable. - Precedence Rules:
- Top-level keys have precedence and will not be overwritten by more nested entries.
- For example,
OPENLINEAGE__TRANSPORT='{..}'will not have its keys overwritten byOPENLINEAGE__TRANSPORT__AUTH__KEY='key'.
Examples
- Basic Example
- Composite Example
- Precedence Example
- Kafka Transport Example
- File Transport with Remote Storage
Setting following environment variables:
OPENLINEAGE__TRANSPORT__TYPE=http
OPENLINEAGE__TRANSPORT__URL=http://localhost:5050
OPENLINEAGE__TRANSPORT__ENDPOINT=/api/v1/lineage
OPENLINEAGE__TRANSPORT__AUTH='{"type":"api_key", "apiKey":"random_token"}'
OPENLINEAGE__TRANSPORT__COMPRESSION=gzip
is equivalent to passing following YAML configuration:
transport:
type: http
url: http://localhost:5050
endpoint: api/v1/lineage
auth:
type: api_key
apiKey: random_token
compression: gzip
Setting following environment variables:
OPENLINEAGE__TRANSPORT__TYPE=composite
OPENLINEAGE__TRANSPORT__TRANSPORTS__FIRST__TYPE=http
OPENLINEAGE__TRANSPORT__TRANSPORTS__FIRST__URL=http://localhost:5050
OPENLINEAGE__TRANSPORT__TRANSPORTS__FIRST__ENDPOINT=/api/v1/lineage
OPENLINEAGE__TRANSPORT__TRANSPORTS__FIRST__AUTH='{"type":"api_key", "apiKey":"random_token"}'
OPENLINEAGE__TRANSPORT__TRANSPORTS__FIRST__COMPRESSION=gzip
OPENLINEAGE__TRANSPORT__TRANSPORTS__SECOND__TYPE=console
is equivalent to passing following YAML configuration:
transport:
type: composite
transports:
first:
type: http
url: http://localhost:5050
endpoint: api/v1/lineage
auth:
type: api_key
apiKey: random_token
compression: gzip
second:
type: console
Setting following environment variables:
OPENLINEAGE__TRANSPORT='{"type":"console"}'
OPENLINEAGE__TRANSPORT__TYPE=http
is equivalent to passing following YAML configuration:
transport:
type: console
Setting following environment variables:
OPENLINEAGE__TRANSPORT__TYPE=kafka
OPENLINEAGE__TRANSPORT__TOPIC=my_topic
OPENLINEAGE__TRANSPORT__CONFIG='{"bootstrap.servers": "localhost:9092,another.host:9092", "acks": "all", "retries": 3}'
OPENLINEAGE__TRANSPORT__FLUSH=true
OPENLINEAGE__TRANSPORT__MESSAGE_KEY=some-value
is equivalent to passing following YAML configuration:
transport:
type: kafka
topic: my_topic
config:
bootstrap.servers: localhost:9092,another.host:9092
acks: all
retries: 3
flush: true
message_key: some-value # this has been aliased to messageKey
Setting following environment variables:
OPENLINEAGE__TRANSPORT__TYPE=file
OPENLINEAGE__TRANSPORT__LOG_FILE_PATH=s3://my-bucket/lineage/events.jsonl
OPENLINEAGE__TRANSPORT__APPEND=true
OPENLINEAGE__TRANSPORT__STORAGE_OPTIONS='{"key": "AKIAIOSFODNN7EXAMPLE", "secret": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY", "endpoint_url": "https://s3.amazonaws.com"}'
is equivalent to passing following YAML configuration:
transport:
type: file
log_file_path: s3://my-bucket/lineage/events.jsonl
append: true
storage_options:
key: AKIAIOSFODNN7EXAMPLE
secret: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
endpoint_url: https://s3.amazonaws.com
Meta variables
There are few variables that do not follow the above pattern (mostly due to legacy reasons):
| Name | Description | Example | Since |
|---|---|---|---|
| OPENLINEAGE_CONFIG | The path to the YAML configuration file | path/to/openlineage.yml | |
| OPENLINEAGE_CLIENT_LOGGING | Logging level of OpenLineage client and its child modules | DEBUG | |
| OPENLINEAGE_DISABLED | When true, OpenLineage will not emit events (default: false) | false | 0.9.0 |
Legacy syntax
Http Transport
For backwards compatibility, the simplest HTTP transport configuration, with only a subset of its config, can be done with environment variables (all other transport types are only configurable with full config). This setup can be done with the following environment variables:
OPENLINEAGE_URL(required, the URL to send lineage events to, example: https://myapp.com)OPENLINEAGE_ENDPOINT(optional, endpoint to which events are sent, default:api/v1/lineage, example: api/v2/events)OPENLINEAGE_API_KEY(optional, token included in the Authentication HTTP header as the Bearer, example: secret_token_123)
To facilitate switch to modern environment variables, aliases are dynamically created for certain variables like OPENLINEAGE_URL.
If OPENLINEAGE_URL is set, it automatically translates into specific transport configurations
that can be used with Composite transport with default_http as the name of the HTTP transport.
Alias rules are following:
- If environment variable
OPENLINEAGE_URL="http://example.com" is set, it would insert following environment variables:
OPENLINEAGE__TRANSPORT__TRANSPORTS__DEFAULT_HTTP__TYPE="http"
OPENLINEAGE__TRANSPORT__TRANSPORTS__DEFAULT_HTTP__URL="http://example.com"
- Similarly if environment variable
OPENLINEAGE_API_KEY="random_key" is set, it will be translated to:
OPENLINEAGE__TRANSPORT__TRANSPORTS__DEFAULT_HTTP__AUTH='{"type": "api_key", "apiKey": "random_key"}'
qually with environment variable OPENLINEAGE_ENDPOINT="api/v1/lineage", that translates to:
OPENLINEAGE__TRANSPORT__TRANSPORTS__DEFAULT_HTTP__ENDPOINT="api/v1/lineage"
- If one does not want to use aliased HTTP transport in Composite Transport, they can set
OPENLINEAGE__TRANSPORT__TRANSPORTS__DEFAULT_HTTPto{}.
Transports
HTTP Transport
The HTTP transport provides synchronous, blocking event emission. This is the default transport implementation suitable for most use cases where immediate event delivery and error handling are preferred.
Configuration
type- string, must be"http". Required.url- string, base url for HTTP requests. Required.endpoint- string specifying the endpoint to which events are sent, appended tourl. Optional, default:api/v1/lineage.timeout- float specifying timeout (in seconds) value used while connecting to server. Optional, default:5.verify- boolean specifying whether the client should verify TLS certificates from the backend. Optional, default:true.auth- dictionary specifying authentication options. Optional, by default no authorization is used. If set, requires thetypeproperty.type- string specifying value for one of the out-of-the-box available authentication methods (api_keyorjwt), or the fully qualified class name of your TokenProvider. Required ifauthis provided.- Configuration options for
api_keyauthentication:apiKey- string setting the Authentication HTTP header as the Bearer. Required iftypeisapi_key.
- Configuration options for
jwtauthentication are documented in the JWT Token Provider section.
compression- string, name of algorithm used by HTTP client to compress request body. Optional, default valuenull, allowed values:gzip. Added in v1.13.0.custom_headers- dictionary of additional headers to be sent with each request. Optional, default:{}.retry- dictionary of additional configuration options for HTTP retries. Added in v1.33.0. Defaults are below; those are non-exhaustive options, but the ones that are set by default.total- total number of retries to be attempted. Default is5.read- number of retries to be attempted on read errors. Default is5.connect- number of retries to be attempted on connection errors. Default is5.backoff_factor- a backoff factor to apply between attempts after the second try, default is0.3.status_forcelist- a set of integer HTTP status codes that we should force a retry on, default is[500, 502, 503, 504].allowed_methods- a set of HTTP methods that we should retry on, default is["HEAD", "POST"].
Behavior
Events are serialized to JSON, and then are sent as HTTP POST request with Content-Type: application/json. Events are sent immediately and the call blocks until completion. Uses httpx with built-in retry support and raises exceptions on failure.
Examples
- Environment Variables
- Single Environment Variable
- Yaml Config
- Python Code
OPENLINEAGE__TRANSPORT__TYPE=http
OPENLINEAGE__TRANSPORT__URL=https://backend:5000
OPENLINEAGE__TRANSPORT__ENDPOINT=api/v1/lineage
OPENLINEAGE__TRANSPORT__TIMEOUT=5
OPENLINEAGE__TRANSPORT__AUTH__TYPE=api_key
OPENLINEAGE__TRANSPORT__AUTH__APIKEY=f048521b-dfe8-47cd-9c65-0cb07d57591e
OPENLINEAGE__TRANSPORT__COMPRESSION=gzip
OPENLINEAGE__TRANSPORT__RETRY='{"total": 5, "read": 5, "connect": 5, "backoff_factor": 0.3, "status_forcelist": [500, 502, 503, 504], "allowed_methods": ["HEAD", "POST"]}'
OPENLINEAGE__TRANSPORT='{"type": "http", "url": "https://backend:5000", "endpoint": "api/v1/lineage", "timeout": 5, "auth": {"type": "api_key", "apiKey": "f048521b-dfe8-47cd-9c65-0cb07d57591e"}, "compression": "gzip", "retry": {"total": 5, "read": 5, "connect": 5, "backoff_factor": 0.3, "status_forcelist": [500, 502, 503, 504], "allowed_methods": ["HEAD", "POST"]}}'
transport:
type: http
url: https://backend:5000
endpoint: api/v1/lineage
timeout: 5
verify: false
auth:
type: api_key
apiKey: f048521b-dfe8-47cd-9c65-0cb07d57591e
compression: gzip
retry:
total: 5
read: 5
connect: 5
backoff_factor: 0.3
status_forcelist: [500, 502, 503, 504]
allowed_methods: ["HEAD", "POST"]
from openlineage.client import OpenLineageClient
from openlineage.client.transport.http import ApiKeyTokenProvider, HttpConfig, HttpCompression, HttpTransport
http_config = HttpConfig(
url="https://backend:5000",
endpoint="api/v1/lineage",
timeout=5,
verify=False,
auth=ApiKeyTokenProvider({"apiKey": "f048521b-dfe8-47cd-9c65-0cb07d57591e"}),
compression=HttpCompression.GZIP,
)
client = OpenLineageClient(transport=HttpTransport(http_config))
JWT Token Provider
The JwtTokenProvider is an authentication provider that exchanges an API key for a JWT token via a POST endpoint. This is useful for services that require OAuth-style authentication where you need to obtain a token before making API requests.
Configuration
When using JWT authentication with HTTP transport, configure the auth section as follows:
type- string, must be"jwt". Required.apiKey- string, the API key used to obtain the JWT token. Required.tokenEndpoint- string, the URL endpoint for token generation. Required.tokenFields- list of strings, JSON field names to search for the token in the response. The provider tries each field in order. Optional, default:["token", "access_token"].expiresInField- string, JSON field name containing the token expiration time in seconds. Optional, default:"expires_in".grantType- string, OAuth grant type parameter sent in the token request. Optional, default:"urn:ietf:params:oauth:grant-type:jwt-bearer".responseType- string, OAuth response type parameter sent in the token request. Optional, default:"token".tokenRefreshBuffer- integer, number of seconds before token expiry to trigger a refresh. Optional, default:120.
Behavior
- The provider sends a POST request with URL-encoded form data containing the API key and OAuth parameters.
- The response is expected to be JSON containing the JWT token and optionally an expiration time.
- Tokens are cached and automatically refreshed before expiration (default: 120 seconds before expiry, configurable via
tokenRefreshBuffer). - If no expiration is provided in the response, the provider attempts to extract it from the JWT payload's
expclaim. - The provider supports multiple JSON field names for the token, trying each in order until a match is found.
- Field matching is case-insensitive and handles both snake_case and camelCase variations (e.g.,
expires_inmatchesexpiresIn).
Examples
- Environment Variables
- Yaml Config
- Python Code
Standard OAuth configuration:
OPENLINEAGE__TRANSPORT__TYPE=http
OPENLINEAGE__TRANSPORT__URL=https://backend:5000
OPENLINEAGE__TRANSPORT__AUTH__TYPE=jwt
OPENLINEAGE__TRANSPORT__AUTH__API_KEY=your-api-key
OPENLINEAGE__TRANSPORT__AUTH__TOKEN_ENDPOINT=https://auth.example.com/token
IBM Cloud IAM configuration:
OPENLINEAGE__TRANSPORT__TYPE=http
OPENLINEAGE__TRANSPORT__URL=https://backend:5000
OPENLINEAGE__TRANSPORT__AUTH__TYPE=jwt
OPENLINEAGE__TRANSPORT__AUTH__API_KEY=your-ibm-api-key
OPENLINEAGE__TRANSPORT__AUTH__TOKEN_ENDPOINT=https://iam.cloud.ibm.com/identity/token
OPENLINEAGE__TRANSPORT__AUTH__GRANT_TYPE=urn:ibm:params:oauth:grant-type:apikey
OPENLINEAGE__TRANSPORT__AUTH__RESPONSE_TYPE=cloud_iam
Standard OAuth configuration:
transport:
type: http
url: https://backend:5000
auth:
type: jwt
apiKey: your-api-key
tokenEndpoint: https://auth.example.com/token
With custom field names:
transport:
type: http
url: https://backend:5000
auth:
type: jwt
apiKey: your-api-key
tokenEndpoint: https://auth.example.com/token
tokenFields: ["access_token", "token"]
expiresInField: expires_in
IBM Cloud IAM configuration:
transport:
type: http
url: https://backend:5000
auth:
type: jwt
apiKey: your-ibm-api-key
tokenEndpoint: https://iam.cloud.ibm.com/identity/token
grantType: urn:ibm:params:oauth:grant-type:apikey
responseType: cloud_iam
Standard OAuth configuration:
from openlineage.client import OpenLineageClient
from openlineage.client.transport.http import HttpConfig, HttpTransport, JwtTokenProvider
http_config = HttpConfig(
url="https://backend:5000",
auth=JwtTokenProvider({
"apiKey": "your-api-key",
"tokenEndpoint": "https://auth.example.com/token"
})
)
client = OpenLineageClient(transport=HttpTransport(http_config))
IBM Cloud IAM configuration:
from openlineage.client import OpenLineageClient
from openlineage.client.transport.http import HttpConfig, HttpTransport, JwtTokenProvider
http_config = HttpConfig(
url="https://backend:5000",
auth=JwtTokenProvider({
"apiKey": "your-ibm-api-key",
"tokenEndpoint": "https://iam.cloud.ibm.com/identity/token",
"grantType": "urn:ibm:params:oauth:grant-type:apikey",
"responseType": "cloud_iam"
})
)
client = OpenLineageClient(transport=HttpTransport(http_config))
Async HTTP Transport
The Async HTTP transport provides high-performance, non-blocking event emission with advanced queuing and ordering guarantees. Use this transport when you need high throughput or want to avoid blocking your application on lineage event delivery.
Async transport API is experimental, and can change over the next few releases.
Configuration
type- string, must be"async_http"or use direct instantiation. Required.url- string, base url for HTTP requests. Required.endpoint- string specifying the endpoint to which events are sent, appended tourl. Optional, default:api/v1/lineage.timeout- float specifying timeout (in seconds) value used while connecting to server. Optional, default:5.verify- boolean specifying whether the client should verify TLS certificates from the backend. Optional, default:true.auth- dictionary specifying authentication options. Optional, by default no authorization is used. If set, requires thetypeproperty.type- string specifying value for one of the out-of-the-box available authentication methods (api_keyorjwt), or the fully qualified class name of your TokenProvider. Required ifauthis provided.- Configuration options for
api_keyauthentication:apiKey- string setting the Authentication HTTP header as the Bearer. Required iftypeisapi_key.
- Configuration options for
jwtauthentication are documented in the JWT Token Provider section.
compression- string, name of algorithm used by HTTP client to compress request body. Optional, default valuenull, allowed values:gzip.custom_headers- dictionary of additional headers to be sent with each request. Optional, default:{}.max_queue_size- integer specifying maximum events in processing queue. Optional, default:10000.max_concurrent_requests- integer specifying maximum parallel HTTP requests. Optional, default:100.retry- dictionary of additional configuration options for HTTP retries. Added in v1.33.0. Defaults are below; those are non-exhaustive options, but the ones that are set by default.total- total number of retries to be attempted. Default is5.read- number of retries to be attempted on read errors. Default is5.connect- number of retries to be attempted on connection errors. Default is5.backoff_factor- a backoff factor to apply between attempts after the second try, default is0.3.status_forcelist- a set of integer HTTP status codes that we should force a retry on, default is[500, 502, 503, 504].allowed_methods- a set of HTTP methods that we should retry on, default is["HEAD", "POST"].
Behavior
Events are processed asynchronously with the following features:
- Event Ordering Guarantees: START events are sent before their corresponding COMPLETE, FAIL, or ABORT events
- High Throughput: Non-blocking event emission with configurable concurrent processing
- Queue Management: Bounded queue prevents memory exhaustion with configurable size
- Advanced Error Handling: Retry logic with exponential backoff for network and server errors
- Event Tracking: Real-time statistics on pending, successful, and failed events
Event Flow
- Events are queued for processing (START events immediately, other events wait until corresponding START event is send)
- Worker thread processes events using configurable parallelism
- Successful START events trigger release of pending completion events
- Event statistics are tracked and available via
get_stats()