Skip to main content
Version: 1.23.0

1.15.0 - 2024-05-24

Added

  • Flink: handle Iceberg tables with nested and complex field types #2706 @dolfinus
    Creates SchemaDatasetFacet with nested fields for Iceberg tables with list, map and struct columns.
  • Flink: handle Avro schema with nested and complex field types #2711 @dolfinus
    Creates SchemaDatasetFacet with nested fields for Avro schemas with complex types (union, record, map, array, fixed).
  • Spark: add facets to Spark application events #2677 @dolfinus
    Adds support for Spark application start and stop events in the ExecutionContext interface.
  • Spark: add nested fields to SchemaDatasetFieldsFacet #2689 @dolfinus
    Adds nested Spark Dataframe fields support to SchemaDatasetFieldsFacet. Also include field comment as description.
  • Spark: add SparkApplicationDetailsFacet #2688 @dolfinus
    Adds SparkApplicationDetailsFacet to runEvents emitted on Spark application start.

Removed

  • Airflow: remove Airflow < 2.3.0 support #2710 @kacpermuda
    Removes Airflow < 2.3.0 support.
  • Integration: use v2 Python facets #2693 @JDarDagran
    Migrates integrations from removed v1 facets to v2 Python facets.

Fixed

  • Spark: improve job suffix assigning mechanism #2665 @pawel-big-lebowski
    For some catalog handlers, the mechanism was creating different dataset identifiers on START and COMPLETE depending on whether a dataset was created or not. This improves the mechanism to assign a deterministic job suffix based on the output dataset at the moment of a start event. Note: this may change job names in some scenarios.
  • Airflow: fix empty dataset name for AthenaExtractor #2700 @kacpermuda
    The dataset name should not be empty when passing only a bucket as S3 output in Athena.
  • Flink: fix SchemaDatasetFacet for Protobuf repeated primitive types #2685 @dolfinus
    Fixes issues with the Protobuf schema converter.
  • Python: clean up Python client code, add logging. #2653 @kacpermuda
    Cleans up client code, refactors logging in all Python modules.
  • SQL: catch TokenizerErrors, PanicException #2703 @mobuchowski
    The SQL parser now catches and handles these errors.
  • Python: suppress warning on importing v1 module in init.py. #2713 @JDarDagran
    Suppresses the deprecation warning when v1 facets are used.
  • Integration/Java/Python: use UUIDv7 instead of UUIDv4 #2686 #2687 @dolfinus
    Uses UUIDv7 instead of UUIDv4 for runEvents. The new UUID version produces monotonically increasing values, which leads to more performant queries on the OL consumer side. Note: UUID version is an implementation detail and can be changed in the future.