mirror of
https://github.com/datahub-project/datahub.git
synced 2025-06-27 05:03:31 +00:00
97 lines
5.3 KiB
Markdown
97 lines
5.3 KiB
Markdown
![]() |
# Monitoring DataHub
|
||
|
|
||
|
Monitoring DataHub's system components is critical for operating and improving DataHub. This doc explains how to add
|
||
|
tracing and metrics measurements in the DataHub containers.
|
||
|
|
||
|
## Tracing
|
||
|
|
||
|
Traces let us track the life of a request across multiple components. Each trace is consisted of multiple spans, which
|
||
|
are units of work, containing various context about the work being done as well as time taken to finish the work. By
|
||
|
looking at the trace, we can more easily identify performance bottlenecks.
|
||
|
|
||
|
We enable tracing by using
|
||
|
the [OpenTelemetry java instrumentation library](https://github.com/open-telemetry/opentelemetry-java-instrumentation).
|
||
|
This project provides a Java agent JAR that is attached to java applications. The agent injects bytecode to capture
|
||
|
telemetry from popular libraries.
|
||
|
|
||
|
Using the agent we are able to
|
||
|
|
||
|
1) Plug and play different tracing tools based on the user's setup: Jaeger, Zipkin, or other tools
|
||
|
2) Get traces for Kafka, JDBC, and Elasticsearch without any additional code
|
||
|
3) Track traces of any function with a simple `@WithSpan` annotation
|
||
|
|
||
|
You can enable the agent by setting env variable `ENABLE_OTEL` to `true` for GMS and MAE/MCE consumers. In our
|
||
|
example [docker-compose](../../docker/monitoring/docker-compose.monitoring.yml), we export metrics to a local Jaeger
|
||
|
instance by setting env variable `OTEL_TRACES_EXPORTER` to `jaeger`
|
||
|
and `OTEL_EXPORTER_JAEGER_ENDPOINT` to `http://jaeger-all-in-one:14250`, but you can easily change this behavior by
|
||
|
setting the correct env variables. Refer to
|
||
|
this [doc](https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk-extensions/autoconfigure/README.md) for
|
||
|
all configs.
|
||
|
|
||
|
Once the above is set up, you should be able to see a detailed trace as a request is sent to GMS. We added
|
||
|
the `@WithSpan` annotation in various places to make the trace more readable. You should start to see traces in the
|
||
|
tracing collector of choice. Our example [docker-compose](../../docker/monitoring/docker-compose.monitoring.yml) deploys
|
||
|
an instance of Jaeger with port 16686. The traces should be available at http://localhost:16686.
|
||
|
|
||
|
## Metrics
|
||
|
|
||
|
With tracing, we can observe how a request flows through our system into the persistence layer. However, for a more
|
||
|
holistic picture, we need to be able to export metrics and measure them across time. Unfortunately, OpenTelemetry's java
|
||
|
metrics library is still in active development.
|
||
|
|
||
|
As such, we decided to use [Dropwizard Metrics](https://metrics.dropwizard.io/4.2.0/) to export custom metrics to JMX,
|
||
|
and then use [Prometheus-JMX exporter](https://github.com/prometheus/jmx_exporter) to export all JMX metrics to
|
||
|
Prometheus. This allows our code base to be independent of the metrics collection tool, making it easy for people to use
|
||
|
their tool of choice. You can enable the agent by setting env variable `ENABLE_PROMETHEUS` to `true` for GMS and MAE/MCE
|
||
|
consumers. Refer to this example [docker-compose](../../docker/monitoring/docker-compose.monitoring.yml) for setting the
|
||
|
variables.
|
||
|
|
||
|
In our example [docker-compose](../../docker/monitoring/docker-compose.monitoring.yml), we have configured prometheus to
|
||
|
scrape from 4318 ports of each container used by the JMX exporter to export metrics. We also configured grafana to
|
||
|
listen to prometheus and create useful dashboards. By default, we provide two
|
||
|
dashboards: [JVM dashboard](https://grafana.com/grafana/dashboards/14845) and DataHub dashboard.
|
||
|
|
||
|
In the JVM dashboard, you can find detailed charts based on JVM metrics like CPU/memory/disk usage. In the DataHub
|
||
|
dashboard, you can find charts to monitor each endpoint and the kafka topics. Using the example implementation, go
|
||
|
to http://localhost:3001 to find the grafana dashboards! (Username: admin, PW: admin)
|
||
|
|
||
|
To make it easy to track various metrics within the code base, we created MetricsUtil class. This util class creates a
|
||
|
central metric registry, sets up the JMX reporter, and provides convenient functions for setting up counters and timers.
|
||
|
You can run the following to create a counter and increment.
|
||
|
|
||
|
```java
|
||
|
MetricUtils.counter(this.getClass(),"metricName").increment();
|
||
|
```
|
||
|
|
||
|
You can run the following to time a block of code.
|
||
|
|
||
|
```java
|
||
|
try(Timer.Context ignored=MetricUtils.timer(this.getClass(),"timerName").timer()){
|
||
|
...block of code
|
||
|
}
|
||
|
```
|
||
|
|
||
|
## Enable monitoring through docker-compose
|
||
|
|
||
|
We provide some example configuration for enabling monitoring in
|
||
|
this [directory](https://github.com/linkedin/datahub/tree/master/docker/monitoring). Take a look at the docker-compose
|
||
|
files, which adds necessary env variables to existing containers, and spawns new containers (Jaeger, Prometheus,
|
||
|
Grafana).
|
||
|
|
||
|
You can add in the above docker-compose using the `-f <<path-to-compose-file>>` when running docker-compose commands.
|
||
|
For instance,
|
||
|
|
||
|
```shell
|
||
|
docker-compose \
|
||
|
-f quickstart/docker-compose.quickstart.yml \
|
||
|
-f monitoring/docker-compose.monitoring.yml \
|
||
|
pull && \
|
||
|
docker-compose -p datahub \
|
||
|
-f quickstart/docker-compose.quickstart.yml \
|
||
|
-f monitoring/docker-compose.monitoring.yml \
|
||
|
up
|
||
|
```
|
||
|
|
||
|
We set up quickstart.sh, dev.sh, and dev-without-neo4j.sh to add the above docker-compose when MONITORING=true. For
|
||
|
instance `MONITORING=true ./docker/quickstart.sh` will add the correct env variables to start collecting traces and
|
||
|
metrics, and also deploy Jaeger, Prometheus, and Grafana. We will soon support this as a flag during quickstart.
|