Spec: Add subset dataset facets to spec#4008@pawel-big-lebowskiAdd subset dataset facets to OpenLineage specification for representing dataset relationships.
Spec: Add DatasetQualityMetricsDatasetFacet#3978@heron--Allow attaching dataset quality information outside of InputDatasetFacet.
Spark: Add support for microbatch source write#4018@tnazarewAdd support for Spark structured streaming microbatch source write operations.
Spark: Add catalog properties to catalog facet#4016@ddebowczyk92Add catalog properties support to Spark integration for better catalog metadata tracking.
Spark: Add GCP project ID and location to BigQuery Metastore catalog properties#4039@ddebowczyk92Enhance BigQuery integration with GCP project ID and location in catalog properties.
Spark: Add support for COALESCE transformation#3972@kchledowskiAdd support for tracking COALESCE transformations in Spark jobs.
Spark: Add catalog facet when using vanilla Hive tables#3982@ddebowczyk92Add catalog facet support for vanilla Hive table operations.
Spark: Make output statistics available within complete event#4013@pawel-big-lebowskiOutput statistics now available in complete events for better observability.
Spark: Add output stats for RDD jobs#3977@pawel-big-lebowskiAdd output statistics tracking for Spark RDD-based jobs.
Java: Add equals and hashcode methods into generated classes#4050@pawel-big-lebowskiImprove generated model classes with proper equals and hashcode implementations.
dbt: Capture dbt tags#4022@mobuchowskiAdd support for capturing dbt tags in OpenLineage events.
dbt: Add dbt Cloud account ID to DbtRunRunFacet#4017@mobuchowskiAdd dbt Cloud account ID tracking to dbt run facets.
dbt: Update DbtRunRunFacet to add more useful information#3987@mobuchowskiEnhance DbtRunRunFacet with additional metadata for better observability.
Python: Add GCP Lineage transport#4006@ddebowczyk92Add native Google Cloud Platform Lineage transport for Python client.
Python: Add fsspec support for FileTransport#3983@JDarDagranAdd fsspec filesystem support to FileTransport for broader filesystem compatibility.
Python: Add default tags with OL client version#3980@kacpermudaAutomatically add OpenLineage client version as default tag in events.
Spark: Improve logging in IcebergInputStatisticsInputDatasetFacetBuilder#3994@JDarDagranEnhance logging for Iceberg input statistics collection.
Spark: Limit external getFileStatus calls when dealing with lots of S3 objects#3985@pawel-big-lebowskiOptimize S3 operations by limiting external getFileStatus calls for large object sets.
Java/Spark/Hive: Move TransformationInfo to Java client to reuse across integrations#3964@kchledowskiRefactor TransformationInfo into shared Java client for cross-integration reuse.
Python: Improve logging in AsyncHttpTransport#4026@dolfinusEnhance logging capabilities in asynchronous HTTP transport.
Python: Allow type aliases#4000@JDarDagranSupport Python type aliases in client code generation.
Python: Fix classes generation for almost identical classes#3997@JDarDagranImprove code generation to properly handle nearly identical class definitions.
Python: Raise errors if custom token provider cannot be loaded#4014@dolfinusFail fast with clear errors when custom token providers fail to load.
Python: Don't silence import errors in DefaultTransportFactory#4015@dolfinusImprove error visibility by not silencing import errors in transport factory.
Python: Import from facet_v2 and event_v2 instead of generated modules#3968@kacpermudaUpdate import paths to use versioned facet and event modules.
Java: Refactor ExecutorService management in OpenLineageClientUtils#4012@JDarDagranImprove thread pool management in Java client utilities.
CI: Replace pre-commit with prek across CI and documentation#3965@JDarDagranMigrate from pre-commit to prek for pre-commit hook management.
Spark: Fix false Hive Glue detection#4053@jsjasonsebaFix incorrect Glue catalog detection due to always attempting ARN resolution.
Spark: Fix CLL on hiveless runtimes#4052@kchledowskiFix column-level lineage failures on Spark runtimes without spark-hive package.
Spark: Fix missing inputs and CLL on some table creation commands#4031@kchledowskiFix missing input datasets and column-level lineage for CreateDataSourceTableAsSelect and CreateHiveTableAsSelect commands.
Spark: Rely on BQ bucket info inside BigQueryIntermediateJobFilter#4044@EugeneYushinFix BigQuery intermediate job filtering by using bucket configuration.
Spark: Fix for TypeNotPresentException/RefreshTableCommand errors in Spark 3.0.2#4002@MaciejGajewskiAdd additional exception handling for TypeNotPresentException in Spark 3.0.2.
Python: Fix license field in pyproject.toml when using build module#4034@JDarDagranCorrect license field specification in Python package metadata.
Python: Accept both apikey and api_key in token provider#4045@kacpermudaSupport both naming conventions for API key configuration parameter.
Java: Fix empty sources jar generation#4037@EugeneYushinFix build issue causing empty sources JAR files to be generated.