1.15.0 - 2024-05-24
Added
- Flink: handle Iceberg tables with nested and complex field types
#2706
@dolfinus
CreatesSchemaDatasetFacet
with nested fields for Iceberg tables with list, map and struct columns. - Flink: handle Avro schema with nested and complex field types
#2711
@dolfinus
CreatesSchemaDatasetFacet
with nested fields for Avro schemas with complex types (union, record, map, array, fixed). - Spark: add facets to Spark application events
#2677
@dolfinus
Adds support for Spark application start and stop events in theExecutionContext
interface. - Spark: add nested fields to
SchemaDatasetFieldsFacet
#2689
@dolfinus
Adds nested Spark Dataframe fields support toSchemaDatasetFieldsFacet
. Also include field comment asdescription
. - Spark: add
SparkApplicationDetailsFacet
#2688
@dolfinus
AddsSparkApplicationDetailsFacet
torunEvent
s emitted on Spark application start.
Removed
- Airflow: remove Airflow < 2.3.0 support
#2710
@kacpermuda
Removes Airflow < 2.3.0 support. - Integration: use v2 Python facets
#2693
@JDarDagran
Migrates integrations from removed v1 facets to v2 Python facets.
Fixed
- Spark: improve job suffix assigning mechanism
#2665
@pawel-big-lebowski
For some catalog handlers, the mechanism was creating different dataset identifiers on START and COMPLETE depending on whether a dataset was created or not. This improves the mechanism to assign a deterministic job suffix based on the output dataset at the moment of a start event. Note: this may change job names in some scenarios. - Airflow: fix empty dataset name for
AthenaExtractor
#2700
@kacpermuda
The dataset name should not be empty when passing only a bucket as S3 output in Athena. - Flink: fix
SchemaDatasetFacet
for Protobuf repeated primitive types#2685
@dolfinus
Fixes issues with the Protobuf schema converter. - Python: clean up Python client code, add logging.
#2653
@kacpermuda
Cleans up client code, refactors logging in all Python modules. - SQL: catch
TokenizerError
s,PanicException
#2703
@mobuchowski
The SQL parser now catches and handles these errors. - Python: suppress warning on importing v1 module in init.py.
#2713
@JDarDagran
Suppresses the deprecation warning when v1 facets are used. - Integration/Java/Python: use UUIDv7 instead of UUIDv4
#2686
#2687
@dolfinus
Uses UUIDv7 instead of UUIDv4 forrunEvent
s. The new UUID version produces monotonically increasing values, which leads to more performant queries on the OL consumer side. Note: UUID version is an implementation detail and can be changed in the future.