Spark Config Parameters
The following parameters can be specified:
| Parameter | Definition | Example |
|---|---|---|
| spark.openlineage.transport.type | The transport type used for event emit, default type is console | http |
| spark.openlineage.namespace | The default namespace to be applied for any jobs submitted | MyNamespace |
| spark.openlineage.parentJobNamespace | The job namespace to be used for the parent job facet | ParentJobNamespace |
| spark.openlineage.parentJobName | The job name to be used for the parent job facet | ParentJobName |
| spark.openlineage.parentRunId | The RunId of the parent job that initiated this Spark job | xxxx-xxxx-xxxx-xxxx |
| spark.openlineage.appName | Custom value overwriting Spark app name in events | AppName |
| spark.openlineage.facets.disabled | Deprecated: Use the property spark.openlineage.facets<facet name>.disabled instead. List of facets to filter out from the events, enclosed in [] (required from 0.21.x) and separated by ;, default is [] | [columnLineage;] |
| spark.openlineage.facets.<facet name>.disabled | If set to true, it disables the specific facet. The default value is false. The name of the facet can be hierarchical. The facets disabled by default are debug, spark.logicalPlan and spark_unknown. You have to switch the flag to false to enable them. | true |
| spark.openlineage.facets.variables | List of environment variables (System.getenv() | [columnLineage;] |
| spark.openlineage.capturedProperties | comma separated list of properties to be captured in spark properties facet (default spark.master, spark.app.name) | "spark.example1,spark.example2" |
| spark.openlineage.dataset.removePath.pattern | Java regular expression that removes ?<remove> named group from dataset path. Can be used to last path subdirectories from paths like s3://my-whatever-path/year=2023/month=04 | (.*)(?<remove>\/.*\/.*) |
| spark.openlineage.jobName.appendDatasetName | Decides whether output dataset name should be appended to job name. By default true. | false |
| spark.openlineage.jobName.replaceDotWithUnderscore | Replaces dots in job name with underscore. Can be used to mimic legacy behaviour on Databricks platform. By default false. | false |
| spark.openlineage.job.owners.<ownership-type> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | spark.openlineage.job.owners.team="Some Team" |
| spark.openlineage.job.tags | List of job-level tags. Tags are passed in a string, with key:value information separated by colon :, and tags being separated by semicolon ; | "key:value;label;another:tag" |
| spark.openlineage.run.tags | List of run-level tags. Tags are passed in a string, with key:value information separated by colon :, and tags being separated by semicolon ; | "key:value;label;another:tag" |
| spark.openlineage.debugFacet | Determines whether debug facet shall be generated and included within the event. Set enabled to turn it on. By default, facet is disabled. | enabled |
| spark.openlineage.columnLineage.datasetLineageEnabled | Makes the dataset dependencies to be included in their own property dataset in the column lineage pattern. If this flag is set to false, then the dataset dependencies are merged into fields property. The default value is false. It is recommended to set it to true | true |
| spark.openlineage.vendors.iceberg.metricsReporterDisabled | Disables metrics reporter for Iceberg which turns off mechanism to collect scan and commit reports. | false |
| spark.openlineage.filter.allowedSparkNodes | List of Spark plan nodes' names separated with ; and enclosed within []. Some Spark nodes are filtered by default to not trigger OpenLineage events. This setting allows to override default behaviour and remove filtering for specified nodes. Example usage: [org.apache.spark.sql.catalyst.plans.logical.Aggregate] will enable events for Aggregate nodes | empty list |
| spark.openlineage.filter.deniedSparkNodes | List of Spark plan nodes' names separated with ; and enclosed within []. Some Spark nodes are filtered by default to not trigger OpenLineage events. This setting allows to override default behaviour and add more nodes to filter. | empty list |