Hive OpenLineage Integration
This project provides an Apache Hive integration for
OpenLineage, enabling automated data lineage capture for your Hive workloads.
The core of the integration is a Hive execution hook (HiveOpenLineageHook
) that intercepts query execution.
The hook analyzes the Hive query plan generated by the SemanticAnalyzer. It traverses the plan's Abstract Syntax Tree (AST) to identify input and output datasets, as well as the transformations performed on the data. It leverages a custom parser (separate from Hive's parser) for more advanced column-level lineage analysis.
Based on the query plan analysis, the hook
constructs OpenLineage events, capturing the data lineage information. Events
include details about the job, datasets (inputs and outputs), and the
relationships between them. The resulting OpenLineage event will be of type
COMPLETE
for successful queries and FAIL
for failed queries.