Skip to main content
Version: 1.25.0

Spark Config Parameters

The following parameters can be specified:

ParameterDefinitionExample
spark.openlineage.transport.typeThe transport type used for event emit, default type is consolehttp
spark.openlineage.namespaceThe default namespace to be applied for any jobs submittedMyNamespace
spark.openlineage.parentJobNamespaceThe job namespace to be used for the parent job facetParentJobNamespace
spark.openlineage.parentJobNameThe job name to be used for the parent job facetParentJobName
spark.openlineage.parentRunIdThe RunId of the parent job that initiated this Spark jobxxxx-xxxx-xxxx-xxxx
spark.openlineage.appNameCustom value overwriting Spark app name in eventsAppName
spark.openlineage.facets.disabledDeprecated: Use the property spark.openlineage.facets<facet name>.disabled instead. List of facets to filter out from the events, enclosed in [] (required from 0.21.x) and separated by ;, default is [][columnLineage;]
spark.openlineage.facets.<facet name>.disabledIf set to true, it disables the specific facet. The default value is false. The name of the facet can be hierarchical. The facets disabled by default are debug, spark.logicalPlan and spark_unknown. You have to switch the flag to false to enable them.true
spark.openlineage.facets.variablesList of environment variables (System.getenv()[columnLineage;]
spark.openlineage.capturedPropertiescomma separated list of properties to be captured in spark properties facet (default spark.master, spark.app.name)"spark.example1,spark.example2"
spark.openlineage.dataset.removePath.patternJava regular expression that removes ?<remove> named group from dataset path. Can be used to last path subdirectories from paths like s3://my-whatever-path/year=2023/month=04(.*)(?<remove>\/.*\/.*)
spark.openlineage.jobName.appendDatasetNameDecides whether output dataset name should be appended to job name. By default true.false
spark.openlineage.jobName.replaceDotWithUnderscoreReplaces dots in job name with underscore. Can be used to mimic legacy behaviour on Databricks platform. By default false.false
spark.openlineage.debugFacetDetermines whether debug facet shall be generated and included within the event. Set enabled to turn it on. By default, facet is disabled.enabled
spark.openlineage.job.owners.<ownership-type>Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13).spark.openlineage.job.owners.team="Some Team"
spark.openlineage.columnLineage.datasetLineageEnabledMakes the dataset dependencies to be included in their own property dataset in the column lineage pattern. If this flag is set to false, then the dataset dependencies are merged into fields property. The default value is false. It is recommended to set it to truetrue