Version: 1.36.0

Spark Config Parameters

The following parameters can be specified:

Parameter	Definition	Example
spark.openlineage.transport.type	The transport type used for event emit, default type is `console`	http
spark.openlineage.namespace	The default namespace to be applied for any jobs submitted	MyNamespace
spark.openlineage.parentJobNamespace	The job namespace to be used for the parent job facet	ParentJobNamespace
spark.openlineage.parentJobName	The job name to be used for the parent job facet	ParentJobName
spark.openlineage.parentRunId	The RunId of the parent job that initiated this Spark job	xxxx-xxxx-xxxx-xxxx
spark.openlineage.rootParentJobNamespace	The namespace of the root parent job	ParentJobNamespace
spark.openlineage.rootParentJobName	The name of the root parent job	ParentJobName
spark.openlineage.rootParentRunId	The RunId of the root parent job	xxxx-xxxx-xxxx-xxxx
spark.openlineage.appName	Custom value overwriting Spark app name in events	AppName
spark.openlineage.facets.disabled	Deprecated: Use the property `spark.openlineage.facets<facet name>.disabled` instead. List of facets to filter out from the events, enclosed in `[]` (required from 0.21.x) and separated by `;`, default is `[]`	[columnLineage;]
spark.openlineage.facets.<facet name>.disabled	If set to true, it disables the specific facet. The default value is `false`. The name of the facet can be hierarchical. The facets disabled by default are `debug`, `spark.logicalPlan` and `spark_unknown`. You have to switch the flag to `false` to enable them.	true
spark.openlineage.facets.variables	List of environment variables (System.getenv()	[columnLineage;]
spark.openlineage.capturedProperties	comma separated list of properties to be captured in spark properties facet (default `spark.master`, `spark.app.name`)	"spark.example1,spark.example2"
spark.openlineage.dataset.removePath.pattern	Java regular expression that removes `?<remove>` named group from dataset path. Can be used to last path subdirectories from paths like `s3://my-whatever-path/year=2023/month=04`	`(.)(?<remove>\/.\/.*)`
spark.openlineage.jobName.appendDatasetName	Decides whether output dataset name should be appended to job name. By default `true`.	false
spark.openlineage.jobName.replaceDotWithUnderscore	Replaces dots in job name with underscore. Can be used to mimic legacy behaviour on Databricks platform. By default `false`.	false
spark.openlineage.job.owners.<ownership-type>	Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13).	spark.openlineage.job.owners.team="Some Team"
spark.openlineage.job.tags	List of job-level tags. Tags are passed in a string, with key:value information separated by colon `:`, and tags being separated by semicolon `;`	"key:value;label;another:tag"
spark.openlineage.run.tags	List of run-level tags. Tags are passed in a string, with key:value information separated by colon `:`, and tags being separated by semicolon `;`	"key:value;label;another:tag"
spark.openlineage.columnLineage.datasetLineageEnabled	Makes the dataset dependencies to be included in their own property `dataset` in the column lineage pattern. If this flag is set to `false`, then the dataset dependencies are merged into `fields` property. The default value is `false`. It is recommended to set it to `true`	true
spark.openlineage.vendors.iceberg.metricsReporterDisabled	Disables metrics reporter for Iceberg which turns off mechanism to collect scan and commit reports.	false
spark.openlineage.filter.allowedSparkNodes	List of Spark plan nodes' names separated with `;` and enclosed within `[]`. Some Spark nodes are filtered by default to not trigger OpenLineage events. This setting allows to override default behaviour and remove filtering for specified nodes. Example usage: `[org.apache.spark.sql.catalyst.plans.logical.Aggregate]` will enable events for `Aggregate` nodes	empty list
spark.openlineage.filter.deniedSparkNodes	List of Spark plan nodes' names separated with `;` and enclosed within `[]`. Some Spark nodes are filtered by default to not trigger OpenLineage events. This setting allows to override default behaviour and add more nodes to filter.	empty list
spark.openlineage.timeout.buildDatasetsTimePercentage	If a timeout is set within a circuit breaker, this configures a percentage of the configured timeout that can be spent on building datasets.	empty list
spark.openlineage.timeout.facetsBuildingTimePercentage	If a timeout is set within a circuit breaker, this configures a percentage of the configured timeout that can be spent on building facets which includes job facets, run facets, and dataset facets. This timeout applies effectively on everything besides event serialization and transport.	empty list
spark.openlineage.disabled	Turns off OpenLineage integration, similarly to `OPENLINEAGE_DISABLED` environment property. Can be used when setting env property is not doable. This setting works only within Spark Conf to prevent OpenLineage from config parsing mechanism.	false