| 
									
										
										
										
											2021-09-07 23:06:15 -07:00
										 |  |  | # Monitoring DataHub
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Monitoring DataHub's system components is critical for operating and improving DataHub. This doc explains how to add | 
					
						
							|  |  |  | tracing and metrics measurements in the DataHub containers. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Tracing
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Traces let us track the life of a request across multiple components. Each trace is consisted of multiple spans, which | 
					
						
							|  |  |  | are units of work, containing various context about the work being done as well as time taken to finish the work. By | 
					
						
							|  |  |  | looking at the trace, we can more easily identify performance bottlenecks. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | We enable tracing by using | 
					
						
							|  |  |  | the [OpenTelemetry java instrumentation library](https://github.com/open-telemetry/opentelemetry-java-instrumentation). | 
					
						
							|  |  |  | This project provides a Java agent JAR that is attached to java applications. The agent injects bytecode to capture | 
					
						
							|  |  |  | telemetry from popular libraries. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Using the agent we are able to | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 1) Plug and play different tracing tools based on the user's setup: Jaeger, Zipkin, or other tools | 
					
						
							|  |  |  | 2) Get traces for Kafka, JDBC, and Elasticsearch without any additional code | 
					
						
							|  |  |  | 3) Track traces of any function with a simple `@WithSpan` annotation | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | You can enable the agent by setting env variable `ENABLE_OTEL` to `true` for GMS and MAE/MCE consumers. In our | 
					
						
							|  |  |  | example [docker-compose](../../docker/monitoring/docker-compose.monitoring.yml), we export metrics to a local Jaeger | 
					
						
							|  |  |  | instance by setting env variable `OTEL_TRACES_EXPORTER` to `jaeger` | 
					
						
							|  |  |  | and `OTEL_EXPORTER_JAEGER_ENDPOINT` to `http://jaeger-all-in-one:14250`, but you can easily change this behavior by | 
					
						
							|  |  |  | setting the correct env variables. Refer to | 
					
						
							|  |  |  | this [doc](https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk-extensions/autoconfigure/README.md) for | 
					
						
							|  |  |  | all configs. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Once the above is set up, you should be able to see a detailed trace as a request is sent to GMS. We added | 
					
						
							|  |  |  | the `@WithSpan` annotation in various places to make the trace more readable. You should start to see traces in the | 
					
						
							|  |  |  | tracing collector of choice. Our example [docker-compose](../../docker/monitoring/docker-compose.monitoring.yml) deploys | 
					
						
							|  |  |  | an instance of Jaeger with port 16686. The traces should be available at http://localhost:16686. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Metrics
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | With tracing, we can observe how a request flows through our system into the persistence layer. However, for a more | 
					
						
							|  |  |  | holistic picture, we need to be able to export metrics and measure them across time. Unfortunately, OpenTelemetry's java | 
					
						
							|  |  |  | metrics library is still in active development. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | As such, we decided to use [Dropwizard Metrics](https://metrics.dropwizard.io/4.2.0/) to export custom metrics to JMX, | 
					
						
							|  |  |  | and then use [Prometheus-JMX exporter](https://github.com/prometheus/jmx_exporter) to export all JMX metrics to | 
					
						
							|  |  |  | Prometheus. This allows our code base to be independent of the metrics collection tool, making it easy for people to use | 
					
						
							|  |  |  | their tool of choice. You can enable the agent by setting env variable `ENABLE_PROMETHEUS` to `true` for GMS and MAE/MCE | 
					
						
							|  |  |  | consumers. Refer to this example [docker-compose](../../docker/monitoring/docker-compose.monitoring.yml) for setting the | 
					
						
							|  |  |  | variables. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In our example [docker-compose](../../docker/monitoring/docker-compose.monitoring.yml), we have configured prometheus to | 
					
						
							|  |  |  | scrape from 4318 ports of each container used by the JMX exporter to export metrics. We also configured grafana to | 
					
						
							|  |  |  | listen to prometheus and create useful dashboards. By default, we provide two | 
					
						
							|  |  |  | dashboards: [JVM dashboard](https://grafana.com/grafana/dashboards/14845) and DataHub dashboard. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In the JVM dashboard, you can find detailed charts based on JVM metrics like CPU/memory/disk usage. In the DataHub | 
					
						
							|  |  |  | dashboard, you can find charts to monitor each endpoint and the kafka topics. Using the example implementation, go | 
					
						
							|  |  |  | to http://localhost:3001 to find the grafana dashboards! (Username: admin, PW: admin) | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-01-06 21:37:16 +05:30
										 |  |  | To make it easy to track various metrics within the code base, we created MetricUtils class. This util class creates a | 
					
						
							| 
									
										
										
										
											2021-09-07 23:06:15 -07:00
										 |  |  | central metric registry, sets up the JMX reporter, and provides convenient functions for setting up counters and timers. | 
					
						
							|  |  |  | You can run the following to create a counter and increment. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```java | 
					
						
							|  |  |  | MetricUtils.counter(this.getClass(),"metricName").increment(); | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | You can run the following to time a block of code. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```java | 
					
						
							|  |  |  | try(Timer.Context ignored=MetricUtils.timer(this.getClass(),"timerName").timer()){ | 
					
						
							|  |  |  |     ...block of code | 
					
						
							|  |  |  |     } | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Enable monitoring through docker-compose
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | We provide some example configuration for enabling monitoring in | 
					
						
							| 
									
										
										
										
											2022-03-18 22:12:19 +01:00
										 |  |  | this [directory](https://github.com/datahub-project/datahub/tree/master/docker/monitoring). Take a look at the docker-compose | 
					
						
							| 
									
										
										
										
											2021-09-07 23:06:15 -07:00
										 |  |  | files, which adds necessary env variables to existing containers, and spawns new containers (Jaeger, Prometheus, | 
					
						
							|  |  |  | Grafana). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | You can add in the above docker-compose using the `-f <<path-to-compose-file>>` when running docker-compose commands. | 
					
						
							|  |  |  | For instance, | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```shell | 
					
						
							|  |  |  | docker-compose \ | 
					
						
							|  |  |  |   -f quickstart/docker-compose.quickstart.yml \ | 
					
						
							|  |  |  |   -f monitoring/docker-compose.monitoring.yml \ | 
					
						
							|  |  |  |   pull && \ | 
					
						
							|  |  |  | docker-compose -p datahub \ | 
					
						
							|  |  |  |   -f quickstart/docker-compose.quickstart.yml \ | 
					
						
							|  |  |  |   -f monitoring/docker-compose.monitoring.yml \ | 
					
						
							|  |  |  |   up | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | We set up quickstart.sh, dev.sh, and dev-without-neo4j.sh to add the above docker-compose when MONITORING=true. For | 
					
						
							|  |  |  | instance `MONITORING=true ./docker/quickstart.sh` will add the correct env variables to start collecting traces and | 
					
						
							|  |  |  | metrics, and also deploy Jaeger, Prometheus, and Grafana. We will soon support this as a flag during quickstart.  |