OpenLineage Proxy
OpenLineage Proxy is a simple Java server that can be used to monitor the JSON events that OpenLineage client emits, as well as tunnel the transmission to the OpenLineage backend such as Marquez.
When you are unable to collect logs on the client side, but want to make sure the event that gets emitted are valid and correct, you can use OpenLineage Proxy to verify the messages.
Accessing the proxy
OpenLineage proxy can be obtained via github:
git clone https://github.com/OpenLineage/OpenLineage.git
cd OpenLineage/proxy/backend
Building the proxy
To build the proxy jar, run
$ ./gradlew build
The packaged jar file can be found under ./build/libs/
Running the proxy
OpenLineage Proxy requires configuration file named proxy.yml
. There is an example that you can copy and name it as proxy.yml
.
cp proxy.example.yml proxy.yml
By default, the OpenLineage proxy uses the following ports:
- TCP port 8080 is available for the HTTP API server.
- TCP port 8081 is available for the admin interface.
You can then run the proxy using gradlew:
$ ./gradlew runShadow
Monitoring OpenLineage events via Proxy
When proxy is running, you can start sending your OpenLineage events just as the same way as you would be sending to any OpenLineage backend server. For example, in your URL for the OpenLineage backend, you can specify it as http://localhost:8080/api/v1/lineage
.
Once the message is sent to the proxy, you will see the OpenLineage message content (JSON) to the console output of the proxy. You can also specify in the configuration to store the messages into the log file.
You might have noticed that OpenLineage client (python, java) simply requires
http://localhost:8080
as the URL endpoint. This is possible because the client code adds the/api/v1/lineage
internally before it makes the request. If you are not using OpenLineage client library to emit OpenLineage events, you must use the full URL in order for the proxy to receive the data correctly.