Skip to main content
Version: Next

Hive OpenLineage Integration

This project provides an Apache Hive integration for OpenLineage, enabling automated data lineage capture for your Hive workloads. The core of the integration is a Hive execution hook (HiveOpenLineageHook) that intercepts query execution.

The hook analyzes the Hive query plan generated by the SemanticAnalyzer. It traverses the plan's Abstract Syntax Tree (AST) to identify input and output datasets, as well as the transformations performed on the data. It leverages a custom parser (separate from Hive's parser) for more advanced column-level lineage analysis.

Based on the query plan analysis, the hook constructs OpenLineage events, capturing the data lineage information. Events include details about the job, datasets (inputs and outputs), and the relationships between them. The resulting OpenLineage event will be of type COMPLETE for successful queries and FAIL for failed queries.