1.10.2 - 2024-03-15
Added
- Dagster: add new provider for version 1.6.10
#2518
@JDarDagran
Adds the new provider required by the latest version of Dagster. - Flink: support lineage for a hybrid source
#2491
@HuangZhenQiu
Adds support for hybrid source lineage for users of Kafka and Iceberg sources in backfill usecases. - Flink: improve Cassandra lineage metadata
#2479
@HuangZhenQiu
Cassandra cluster info to be used as the dataset namespace, and the keyspace to be combined with the table name as the dataset name. - Flink: bump Flink JDBC connector version
#2472
@HuangZhenQiu
Bumps the Flink JDBC connector version to 3.1.2-1.18 for Flink 1.18. - Java: add a
OpenLineageClientUtils#loadOpenLineageJson(InputStream)
and changeOpenLineageClientUtils#loadOpenLineageYaml(InputStream)
methods#2490
@d-m-h
This improves the explicitness of the methods. Previously,loadOpenLineageYaml(InputStream)
wanted theInputStream
to contain bytes that represented JSON. - Java: add info from the HTTP response to the client exception
#2486
@davidjgoss
Adds the status code and body as properties on the thrown exception when a non-success response is encountered in the HTTP transport. - Python: add support for MSK IAM authentication with a new transport
#2478
@mattiabertorello
Eases publication of events to MSK with IAM authentication.
Removed
- Airflow: remove redundant information from facets
#2524
@kacpermuda
Refines the operator's attribute inclusion logic in facets to include only those known to be important or compact, ensuring that custom operator attributes with substantial data do not inflate the event size.
Fixed
- Airflow: proceed without rendering templates if
task_instance
copy fails#2492
@kacpermuda
Airflow will now proceed without rendering templates iftask_instance
copy fails inlistener.on_task_instance_running
. - Spark: fix the
HttpTransport
timeout#2475
@pawel-big-lebowski
The existingtimeout
config parameter is ambiguous: implementation treats the value as double in seconds, although the documentation claims it's milliseconds. A new config paramtimeoutInMillis
has been added. the Existingtimeout
has been removed from docs and will be deprecated in 1.13. - Spark: prevent NPE if the context is null
#2515
@pawel-big-lebowski
Adds a check for a null context before executingend(jobEnd)
. - Flink: fix class not found issue for Cassandra
#2507
@pawel-big-lebowski
Fixes the class not found issue when checking for Cassandra classes. Also fixes the Maven POM dependency on subprojects. - Flink: refine the JDBC table name
#2512
@HuangZhenQiu
Enables the JDBC table name with a schema prefix. - Flink: fix JDBC dataset naming
#2508
@pawel-big-lebowski
For JDBC, the Flink integration is not adjusted to the Openlineage naming convention. There is code that extracts the dataset namespace/name from the JDBC connection url, but it's in the Spark integration. As a solution, this code has to be extracted into the Java client and reused by the Spark and Flink integrations. - Flink: fix failure due to missing Cassandra classes
#2507
@pawel-big-lebowski
Flink is failing when no Cassandra classes are present on the class path. This is happening because ofCassandraUtils
class which has a statichasClasses
method, but it imports Cassandra-related classes in the header. Also, the Flink subproject contains an unnecessarymaven-publish
plugin. - Flink: fix release runtime dependencies
#2504
@HuangZhenQiu
The shadow jar of Flink is not minimized, so some internal jars are listed as runtime dependences. This removes them from the final pom.xml file in the Flink module. - Spec: improve Cassandra lineage metadata
#2479
@HuangZhenQiu
Following the namespace definition, we should usecassandra://host:port
.