Skip to main content
Version: 1.23.0

1.10.2 - 2024-03-15

Added

  • Dagster: add new provider for version 1.6.10 #2518 @JDarDagran
    Adds the new provider required by the latest version of Dagster.
  • Flink: support lineage for a hybrid source #2491 @HuangZhenQiu
    Adds support for hybrid source lineage for users of Kafka and Iceberg sources in backfill usecases.
  • Flink: improve Cassandra lineage metadata #2479 @HuangZhenQiu
    Cassandra cluster info to be used as the dataset namespace, and the keyspace to be combined with the table name as the dataset name.
  • Flink: bump Flink JDBC connector version #2472 @HuangZhenQiu
    Bumps the Flink JDBC connector version to 3.1.2-1.18 for Flink 1.18.
  • Java: add a OpenLineageClientUtils#loadOpenLineageJson(InputStream) and change OpenLineageClientUtils#loadOpenLineageYaml(InputStream) methods #2490 @d-m-h
    This improves the explicitness of the methods. Previously, loadOpenLineageYaml(InputStream) wanted the InputStream to contain bytes that represented JSON.
  • Java: add info from the HTTP response to the client exception #2486 @davidjgoss
    Adds the status code and body as properties on the thrown exception when a non-success response is encountered in the HTTP transport.
  • Python: add support for MSK IAM authentication with a new transport #2478 @mattiabertorello
    Eases publication of events to MSK with IAM authentication.

Removed

  • Airflow: remove redundant information from facets #2524 @kacpermuda
    Refines the operator's attribute inclusion logic in facets to include only those known to be important or compact, ensuring that custom operator attributes with substantial data do not inflate the event size.

Fixed

  • Airflow: proceed without rendering templates if task_instance copy fails #2492 @kacpermuda
    Airflow will now proceed without rendering templates if task_instance copy fails in listener.on_task_instance_running.
  • Spark: fix the HttpTransport timeout #2475 @pawel-big-lebowski
    The existing timeout config parameter is ambiguous: implementation treats the value as double in seconds, although the documentation claims it's milliseconds. A new config param timeoutInMillis has been added. the Existing timeout has been removed from docs and will be deprecated in 1.13.
  • Spark: prevent NPE if the context is null #2515 @pawel-big-lebowski
    Adds a check for a null context before executing end(jobEnd).
  • Flink: fix class not found issue for Cassandra #2507 @pawel-big-lebowski
    Fixes the class not found issue when checking for Cassandra classes. Also fixes the Maven POM dependency on subprojects.
  • Flink: refine the JDBC table name #2512 @HuangZhenQiu
    Enables the JDBC table name with a schema prefix.
  • Flink: fix JDBC dataset naming #2508 @pawel-big-lebowski
    For JDBC, the Flink integration is not adjusted to the Openlineage naming convention. There is code that extracts the dataset namespace/name from the JDBC connection url, but it's in the Spark integration. As a solution, this code has to be extracted into the Java client and reused by the Spark and Flink integrations.
  • Flink: fix failure due to missing Cassandra classes #2507 @pawel-big-lebowski
    Flink is failing when no Cassandra classes are present on the class path. This is happening because of CassandraUtils class which has a static hasClasses method, but it imports Cassandra-related classes in the header. Also, the Flink subproject contains an unnecessary maven-publish plugin.
  • Flink: fix release runtime dependencies #2504 @HuangZhenQiu
    The shadow jar of Flink is not minimized, so some internal jars are listed as runtime dependences. This removes them from the final pom.xml file in the Flink module.
  • Spec: improve Cassandra lineage metadata #2479 @HuangZhenQiu
    Following the namespace definition, we should use cassandra://host:port.