mirror of
https://github.com/datahub-project/datahub.git
synced 2025-12-12 18:47:45 +00:00
refactor(docker): make docker files easier to use during development. (#1777)
* Make docker files easier to use during development. During development it quite nice to have docker work with locally built code. This allows you to launch all services very quickly, with your changes, and optionally with debugging support. Changes made to docker files: - Removed all redundant docker-compose files. We now have 1 giant file, and smaller files to use as overrides. - Remove redundant README files that provided little information. - Rename docker/<dir> to match the service name in the docker-compose file for clarity. - Move environment variables to .env files. We only provide dev / the default environment for quickstart. - Add debug options to docker files using multistage build to build minimal images with the idea that built files will be mounted instead. - Add a docker/dev.sh script + compose file to easily use the dev override images (separate tag; images never published; uses debug docker files; mounts binaries to image). - Added docs/docker documentation for this.
This commit is contained in:
parent
43dfce8b2f
commit
b8e18b0b5d
2
.github/workflows/docker-frontend.yml
vendored
2
.github/workflows/docker-frontend.yml
vendored
@ -21,6 +21,8 @@ jobs:
|
||||
echo "tag=$TAG"
|
||||
echo "::set-output name=tag::$TAG"
|
||||
- uses: docker/build-push-action@v1
|
||||
env:
|
||||
DOCKER_BUILDKIT: 1
|
||||
with:
|
||||
dockerfile: ./docker/frontend/Dockerfile
|
||||
username: ${{ secrets.DOCKER_USERNAME }}
|
||||
|
||||
2
.github/workflows/docker-gms.yml
vendored
2
.github/workflows/docker-gms.yml
vendored
@ -21,6 +21,8 @@ jobs:
|
||||
echo "tag=$TAG"
|
||||
echo "::set-output name=tag::$TAG"
|
||||
- uses: docker/build-push-action@v1
|
||||
env:
|
||||
DOCKER_BUILDKIT: 1
|
||||
with:
|
||||
dockerfile: ./docker/gms/Dockerfile
|
||||
username: ${{ secrets.DOCKER_USERNAME }}
|
||||
|
||||
2
.github/workflows/docker-mae-consumer.yml
vendored
2
.github/workflows/docker-mae-consumer.yml
vendored
@ -21,6 +21,8 @@ jobs:
|
||||
echo "tag=$TAG"
|
||||
echo "::set-output name=tag::$TAG"
|
||||
- uses: docker/build-push-action@v1
|
||||
env:
|
||||
DOCKER_BUILDKIT: 1
|
||||
with:
|
||||
dockerfile: ./docker/mae-consumer/Dockerfile
|
||||
username: ${{ secrets.DOCKER_USERNAME }}
|
||||
|
||||
2
.github/workflows/docker-mce-consumer.yml
vendored
2
.github/workflows/docker-mce-consumer.yml
vendored
@ -21,6 +21,8 @@ jobs:
|
||||
echo "tag=$TAG"
|
||||
echo "::set-output name=tag::$TAG"
|
||||
- uses: docker/build-push-action@v1
|
||||
env:
|
||||
DOCKER_BUILDKIT: 1
|
||||
with:
|
||||
dockerfile: ./docker/mce-consumer/Dockerfile
|
||||
username: ${{ secrets.DOCKER_USERNAME }}
|
||||
|
||||
3
.gitignore
vendored
3
.gitignore
vendored
@ -17,11 +17,8 @@ metadata-events/mxe-registration/src/main/resources/**/*.avsc
|
||||
.java-version
|
||||
|
||||
# Python
|
||||
.env
|
||||
.venv
|
||||
env/
|
||||
venv/
|
||||
ENV/
|
||||
env.bak/
|
||||
venv.bak/
|
||||
.mypy_cache/
|
||||
|
||||
@ -1,27 +1,56 @@
|
||||
# Docker Images
|
||||
|
||||
## Prerequisites
|
||||
You need to install [docker](https://docs.docker.com/install/) and
|
||||
[docker-compose](https://docs.docker.com/compose/install/) (if using Linux; on Windows and Mac compose is included with
|
||||
Docker Desktop).
|
||||
|
||||
Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap
|
||||
area.
|
||||
|
||||
## Quickstart
|
||||
|
||||
The easiest way to bring up and test DataHub is using DataHub [Docker](https://www.docker.com) images
|
||||
which are continuously deployed to [Docker Hub](https://hub.docker.com/u/linkedin) with every commit to repository.
|
||||
|
||||
You can easily download and run all these images and their dependencies with our
|
||||
[quick start guide](../docs/quickstart.md).
|
||||
|
||||
DataHub Docker Images:
|
||||
|
||||
* [linkedin/datahub-gms](https://cloud.docker.com/repository/docker/linkedin/datahub-gms/)
|
||||
* [linkedin/datahub-frontend](https://cloud.docker.com/repository/docker/linkedin/datahub-frontend/)
|
||||
* [linkedin/datahub-mae-consumer](https://cloud.docker.com/repository/docker/linkedin/datahub-mae-consumer/)
|
||||
* [linkedin/datahub-mce-consumer](https://cloud.docker.com/repository/docker/linkedin/datahub-mce-consumer/)
|
||||
|
||||
Above Docker images are created for DataHub specific use. You can check subdirectories to check how those images are
|
||||
generated via [Dockerbuild](https://docs.docker.com/engine/reference/commandline/build/) files or
|
||||
how to start each container using [Docker Compose](https://docs.docker.com/compose/). Other than these, DataHub depends
|
||||
on below Docker images to be able to run:
|
||||
Dependencies:
|
||||
* [**Kafka and Schema Registry**](kafka)
|
||||
* [**Elasticsearch**](elasticsearch)
|
||||
* [**Elasticsearch**](elasticsearch-setup)
|
||||
* [**MySQL**](mysql)
|
||||
|
||||
Local-built ingestion image allows you to create on an ad-hoc basis `metadatachangeevent` with Python script.
|
||||
The pipeline depends on all the above images composing up.
|
||||
* [**Ingestion**](ingestion)
|
||||
### Ingesting demo data.
|
||||
|
||||
## Prerequisites
|
||||
You need to install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/).
|
||||
If you want to test ingesting some data once DataHub is up, see [**Ingestion**](ingestion/README.md).
|
||||
|
||||
## Quickstart
|
||||
If you want to quickly try and evaluate DataHub by running all necessary Docker containers, you can check
|
||||
[Quickstart Guide](quickstart).
|
||||
## Using Docker Images During Development
|
||||
|
||||
See [Using Docker Images During Development](../docs/docker/development.md).
|
||||
|
||||
## Building And Deploying Docker Images
|
||||
|
||||
We use GitHub actions to build and continuously deploy our images. There should be no need to do this manually; a
|
||||
successful release on Github will automatically publish the images.
|
||||
|
||||
### Building images
|
||||
|
||||
To build the full images (that we are going to publish), you need to run the following:
|
||||
|
||||
```
|
||||
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub build
|
||||
```
|
||||
|
||||
This is because we're relying on builtkit for multistage builds. It does not hurt also set `DATAHUB_VERSION` to
|
||||
something unique.
|
||||
|
||||
This is not our recommended development flow and most developers should be following the
|
||||
[Using Docker Images During Development](#using-docker-images-during-development) guide.
|
||||
6
docker/broker/env/docker.env
vendored
Normal file
6
docker/broker/env/docker.env
vendored
Normal file
@ -0,0 +1,6 @@
|
||||
KAFKA_BROKER_ID=1
|
||||
KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
|
||||
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
|
||||
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
|
||||
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
|
||||
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0
|
||||
16
docker/datahub-frontend/README.md
Normal file
16
docker/datahub-frontend/README.md
Normal file
@ -0,0 +1,16 @@
|
||||
# DataHub Frontend Docker Image
|
||||
|
||||
[](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-frontend+docker%22)
|
||||
|
||||
Refer to [DataHub Frontend Service](../../datahub-frontend) to have a quick understanding of the architecture and
|
||||
responsibility of this service for the DataHub.
|
||||
|
||||
## Checking out DataHub UI
|
||||
|
||||
After starting your Docker container, you can connect to it by typing below into your favorite web browser:
|
||||
|
||||
```
|
||||
http://localhost:9001
|
||||
```
|
||||
|
||||
You can sign in with `datahub` as username and password.
|
||||
5
docker/datahub-frontend/env/docker.env
vendored
Normal file
5
docker/datahub-frontend/env/docker.env
vendored
Normal file
@ -0,0 +1,5 @@
|
||||
DATAHUB_GMS_HOST=datahub-gms
|
||||
DATAHUB_GMS_PORT=8080
|
||||
DATAHUB_SECRET=YouKnowNothing
|
||||
DATAHUB_APP_VERSION=1.0
|
||||
DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB
|
||||
28
docker/datahub-gms/Dockerfile
Normal file
28
docker/datahub-gms/Dockerfile
Normal file
@ -0,0 +1,28 @@
|
||||
# Defining environment
|
||||
ARG APP_ENV=prod
|
||||
|
||||
FROM openjdk:8-jre-alpine as base
|
||||
ENV DOCKERIZE_VERSION v0.6.1
|
||||
RUN apk --no-cache add curl tar \
|
||||
&& curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.20.v20190813/jetty-runner-9.4.20.v20190813.jar --output jetty-runner.jar \
|
||||
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
|
||||
|
||||
FROM openjdk:8 as prod-build
|
||||
COPY . /datahub-src
|
||||
RUN cd /datahub-src && ./gradlew :gms:war:build
|
||||
RUN cp /datahub-src/gms/war/build/libs/war.war /war.war
|
||||
|
||||
FROM base as prod-install
|
||||
COPY --from=prod-build /war.war /datahub/datahub-gms/bin/war.war
|
||||
COPY --from=prod-build /datahub-src/docker/datahub-gms/start.sh /datahub/datahub-gms/scripts/start.sh
|
||||
RUN chmod +x /datahub/datahub-gms/scripts/start.sh
|
||||
|
||||
FROM base as dev-install
|
||||
# Dummy stage for development. Assumes code is built on your machine and mounted to this image.
|
||||
# See this excellent thread https://github.com/docker/cli/issues/1134
|
||||
|
||||
FROM ${APP_ENV}-install as final
|
||||
|
||||
EXPOSE 8080
|
||||
|
||||
CMD /datahub/datahub-gms/scripts/start.sh
|
||||
22
docker/datahub-gms/README.md
Normal file
22
docker/datahub-gms/README.md
Normal file
@ -0,0 +1,22 @@
|
||||
# DataHub Generalized Metadata Store (GMS) Docker Image
|
||||
[](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-gms+docker%22)
|
||||
|
||||
Refer to [DataHub GMS Service](../../gms) to have a quick understanding of the architecture and
|
||||
responsibility of this service for the DataHub.
|
||||
|
||||
## Other Database Platforms
|
||||
|
||||
While GMS defaults to using MySQL as its storage backend, it is possible to switch to any of the
|
||||
[database platforms](https://ebean.io/docs/database/) supported by Ebean.
|
||||
|
||||
For example, you can run the following command to start a GMS that connects to a PostgreSQL backend.
|
||||
|
||||
```
|
||||
(cd docker/ && docker-compose -f docker-compose.yml -f docker-compose.postgre.yml -p datahub up)
|
||||
```
|
||||
|
||||
or a MariaDB backend
|
||||
|
||||
```
|
||||
(cd docker/ && docker-compose -f docker-compose.yml -f docker-compose.mariadb.yml -p datahub up)
|
||||
```
|
||||
13
docker/datahub-gms/env/docker.env
vendored
Normal file
13
docker/datahub-gms/env/docker.env
vendored
Normal file
@ -0,0 +1,13 @@
|
||||
EBEAN_DATASOURCE_USERNAME=datahub
|
||||
EBEAN_DATASOURCE_PASSWORD=datahub
|
||||
EBEAN_DATASOURCE_HOST=mysql:3306
|
||||
EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8
|
||||
EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
|
||||
KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
ELASTICSEARCH_HOST=elasticsearch
|
||||
ELASTICSEARCH_PORT=9200
|
||||
NEO4J_HOST=neo4j:7474
|
||||
NEO4J_URI=bolt://neo4j
|
||||
NEO4J_USERNAME=neo4j
|
||||
NEO4J_PASSWORD=datahub
|
||||
13
docker/datahub-gms/env/docker.mariadb.env
vendored
Normal file
13
docker/datahub-gms/env/docker.mariadb.env
vendored
Normal file
@ -0,0 +1,13 @@
|
||||
EBEAN_DATASOURCE_USERNAME=datahub
|
||||
EBEAN_DATASOURCE_PASSWORD=datahub
|
||||
EBEAN_DATASOURCE_HOST=mariadb:3306
|
||||
EBEAN_DATASOURCE_URL=jdbc:mariadb://mariadb:3306/datahub
|
||||
EBEAN_DATASOURCE_DRIVER=org.mariadb.jdbc.Driver
|
||||
KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
ELASTICSEARCH_HOST=elasticsearch
|
||||
ELASTICSEARCH_PORT=9200
|
||||
NEO4J_HOST=neo4j:7474
|
||||
NEO4J_URI=bolt://neo4j
|
||||
NEO4J_USERNAME=neo4j
|
||||
NEO4J_PASSWORD=datahub
|
||||
13
docker/datahub-gms/env/docker.postgres.env
vendored
Normal file
13
docker/datahub-gms/env/docker.postgres.env
vendored
Normal file
@ -0,0 +1,13 @@
|
||||
EBEAN_DATASOURCE_USERNAME=datahub
|
||||
EBEAN_DATASOURCE_PASSWORD=datahub
|
||||
EBEAN_DATASOURCE_HOST=postgres:5432
|
||||
EBEAN_DATASOURCE_URL=jdbc:postgresql://postgres:5432/datahub
|
||||
EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver
|
||||
KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
ELASTICSEARCH_HOST=elasticsearch
|
||||
ELASTICSEARCH_PORT=9200
|
||||
NEO4J_HOST=neo4j:7474
|
||||
NEO4J_URI=bolt://neo4j
|
||||
NEO4J_USERNAME=neo4j
|
||||
NEO4J_PASSWORD=datahub
|
||||
2
docker/gms/start.sh → docker/datahub-gms/start.sh
Normal file → Executable file
2
docker/gms/start.sh → docker/datahub-gms/start.sh
Normal file → Executable file
@ -6,4 +6,4 @@ dockerize \
|
||||
-wait http://$ELASTICSEARCH_HOST:$ELASTICSEARCH_PORT \
|
||||
-wait http://$NEO4J_HOST \
|
||||
-timeout 240s \
|
||||
java -jar jetty-runner.jar gms.war
|
||||
java -jar /jetty-runner.jar /datahub/datahub-gms/bin/war.war
|
||||
27
docker/datahub-mae-consumer/Dockerfile
Normal file
27
docker/datahub-mae-consumer/Dockerfile
Normal file
@ -0,0 +1,27 @@
|
||||
# Defining environment
|
||||
ARG APP_ENV=prod
|
||||
|
||||
FROM openjdk:8-jre-alpine as base
|
||||
ENV DOCKERIZE_VERSION v0.6.1
|
||||
RUN apk --no-cache add curl tar \
|
||||
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
|
||||
|
||||
FROM openjdk:8 as prod-build
|
||||
COPY . datahub-src
|
||||
RUN cd datahub-src && ./gradlew :metadata-jobs:mae-consumer-job:build
|
||||
RUN cd datahub-src && cp metadata-jobs/mae-consumer-job/build/libs/mae-consumer-job.jar ../mae-consumer-job.jar
|
||||
|
||||
FROM base as prod-install
|
||||
COPY --from=prod-build /mae-consumer-job.jar /datahub/datahub-mae-consumer/bin/
|
||||
COPY --from=prod-build /datahub-src/docker/datahub-mae-consumer/start.sh /datahub/datahub-mae-consumer/scripts/
|
||||
RUN chmod +x /datahub/datahub-mae-consumer/scripts/start.sh
|
||||
|
||||
FROM base as dev-install
|
||||
# Dummy stage for development. Assumes code is built on your machine and mounted to this image.
|
||||
# See this excellent thread https://github.com/docker/cli/issues/1134
|
||||
|
||||
FROM ${APP_ENV}-install as final
|
||||
|
||||
EXPOSE 9090
|
||||
|
||||
CMD /datahub/datahub-mae-consumer/scripts/start.sh
|
||||
5
docker/datahub-mae-consumer/README.md
Normal file
5
docker/datahub-mae-consumer/README.md
Normal file
@ -0,0 +1,5 @@
|
||||
# DataHub MetadataAuditEvent (MAE) Consumer Docker Image
|
||||
[](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mae-consumer+docker%22)
|
||||
|
||||
Refer to [DataHub MAE Consumer Job](../../metadata-jobs/mae-consumer-job) to have a quick understanding of the architecture and
|
||||
responsibility of this service for the DataHub.
|
||||
8
docker/datahub-mae-consumer/env/docker.env
vendored
Normal file
8
docker/datahub-mae-consumer/env/docker.env
vendored
Normal file
@ -0,0 +1,8 @@
|
||||
KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
ELASTICSEARCH_HOST=elasticsearch
|
||||
ELASTICSEARCH_PORT=9200
|
||||
NEO4J_HOST=neo4j:7474
|
||||
NEO4J_URI=bolt://neo4j
|
||||
NEO4J_USERNAME=neo4j
|
||||
NEO4J_PASSWORD=datahub
|
||||
2
docker/mae-consumer/start.sh → docker/datahub-mae-consumer/start.sh
Normal file → Executable file
2
docker/mae-consumer/start.sh → docker/datahub-mae-consumer/start.sh
Normal file → Executable file
@ -5,4 +5,4 @@ dockerize \
|
||||
-wait http://$ELASTICSEARCH_HOST:$ELASTICSEARCH_PORT \
|
||||
-wait http://$NEO4J_HOST \
|
||||
-timeout 240s \
|
||||
java -jar mae-consumer-job.jar
|
||||
java -jar /datahub/datahub-mae-consumer/bin/mae-consumer-job.jar
|
||||
27
docker/datahub-mce-consumer/Dockerfile
Normal file
27
docker/datahub-mce-consumer/Dockerfile
Normal file
@ -0,0 +1,27 @@
|
||||
# Defining environment
|
||||
ARG APP_ENV=prod
|
||||
|
||||
FROM openjdk:8-jre-alpine as base
|
||||
ENV DOCKERIZE_VERSION v0.6.1
|
||||
RUN apk --no-cache add curl tar \
|
||||
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
|
||||
|
||||
FROM openjdk:8 as prod-build
|
||||
COPY . datahub-src
|
||||
RUN cd datahub-src && ./gradlew :metadata-jobs:mce-consumer-job:build
|
||||
RUN cd datahub-src && cp metadata-jobs/mce-consumer-job/build/libs/mce-consumer-job.jar ../mce-consumer-job.jar
|
||||
|
||||
FROM base as prod-install
|
||||
COPY --from=prod-build /mce-consumer-job.jar /datahub/datahub-mce-consumer/bin/
|
||||
COPY --from=prod-build /datahub-src/docker/datahub-mce-consumer/start.sh /datahub/datahub-mce-consumer/scripts/
|
||||
RUN chmod +x /datahub/datahub-mce-consumer/scripts/start.sh
|
||||
|
||||
FROM base as dev-install
|
||||
# Dummy stage for development. Assumes code is built on your machine and mounted to this image.
|
||||
# See this excellent thread https://github.com/docker/cli/issues/1134
|
||||
|
||||
FROM ${APP_ENV}-install as final
|
||||
|
||||
EXPOSE 9090
|
||||
|
||||
CMD /datahub/datahub-mce-consumer/scripts/start.sh
|
||||
5
docker/datahub-mce-consumer/README.md
Normal file
5
docker/datahub-mce-consumer/README.md
Normal file
@ -0,0 +1,5 @@
|
||||
# DataHub MetadataChangeEvent (MCE) Consumer Docker Image
|
||||
[](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mce-consumer+docker%22)
|
||||
|
||||
Refer to [DataHub MCE Consumer Job](../../metadata-jobs/mce-consumer-job) to have a quick understanding of the architecture and
|
||||
responsibility of this service for the DataHub.
|
||||
4
docker/datahub-mce-consumer/env/docker.env
vendored
Normal file
4
docker/datahub-mce-consumer/env/docker.env
vendored
Normal file
@ -0,0 +1,4 @@
|
||||
KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
GMS_HOST=datahub-gms
|
||||
GMS_PORT=8080
|
||||
2
docker/mce-consumer/start.sh → docker/datahub-mce-consumer/start.sh
Normal file → Executable file
2
docker/mce-consumer/start.sh → docker/datahub-mce-consumer/start.sh
Normal file → Executable file
@ -4,4 +4,4 @@
|
||||
dockerize \
|
||||
-wait tcp://$KAFKA_BOOTSTRAP_SERVER \
|
||||
-timeout 240s \
|
||||
java -jar mce-consumer-job.jar
|
||||
java -jar /datahub/datahub-mce-consumer/bin/mce-consumer-job.jar
|
||||
17
docker/dev.sh
Executable file
17
docker/dev.sh
Executable file
@ -0,0 +1,17 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Launches dev instances of DataHub images. See documentation for more details.
|
||||
# YOU MUST BUILD VIA GRADLE BEFORE RUNNING THIS.
|
||||
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
|
||||
cd $DIR && \
|
||||
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose \
|
||||
-f docker-compose.yml \
|
||||
-f docker-compose.override.yml \
|
||||
-f docker-compose.dev.yml \
|
||||
pull \
|
||||
&& \
|
||||
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub \
|
||||
-f docker-compose.yml \
|
||||
-f docker-compose.override.yml \
|
||||
-f docker-compose.dev.yml \
|
||||
up
|
||||
45
docker/docker-compose.dev.yml
Normal file
45
docker/docker-compose.dev.yml
Normal file
@ -0,0 +1,45 @@
|
||||
# Default overrides for running local development.
|
||||
|
||||
# Images here are made as "development" images by following the general pattern of defining a multistage build with
|
||||
# separate prod/dev steps; using APP_ENV to specify which to use. The dev steps should avoid building and instead assume
|
||||
# that binaries and scripts will be mounted to the image, as also set up by this file. Also see see this excellent
|
||||
# thread https://github.com/docker/cli/issues/1134.
|
||||
|
||||
# To make a JVM app debuggable via IntelliJ, go to its env file and add JVM debug flags, and then add the JVM debug
|
||||
# port to this file.
|
||||
---
|
||||
# TODO mount + debug docker file for frontend
|
||||
version: '3.8'
|
||||
services:
|
||||
datahub-gms:
|
||||
image: linkedin/datahub-gms:debug
|
||||
build:
|
||||
context: datahub-gms
|
||||
dockerfile: Dockerfile
|
||||
args:
|
||||
APP_ENV: dev
|
||||
volumes:
|
||||
- ./datahub-gms/start.sh:/datahub/datahub-gms/scripts/start.sh
|
||||
- ../gms/war/build/libs/:/datahub/datahub-gms/bin
|
||||
|
||||
datahub-mae-consumer:
|
||||
image: linkedin/datahub-mae-consumer:debug
|
||||
build:
|
||||
context: datahub-mae-consumer
|
||||
dockerfile: Dockerfile
|
||||
args:
|
||||
APP_ENV: dev
|
||||
volumes:
|
||||
- ./datahub-mae-consumer/start.sh:/datahub/datahub-mae-consumer/scripts/start.sh
|
||||
- ../metadata-jobs/mae-consumer-job/build/libs/:/datahub/datahub-mae-consumer/bin/
|
||||
|
||||
datahub-mce-consumer:
|
||||
image: linkedin/datahub-mce-consumer:debug
|
||||
build:
|
||||
context: datahub-mce-consumer
|
||||
dockerfile: Dockerfile
|
||||
args:
|
||||
APP_ENV: dev
|
||||
volumes:
|
||||
- ./datahub-mce-consumer/start.sh:/datahub/datahub-mce-consumer/scripts/start.sh
|
||||
- ../metadata-jobs/mce-consumer-job/build/libs/:/datahub/datahub-mce-consumer/bin
|
||||
24
docker/docker-compose.override.yml
Normal file
24
docker/docker-compose.override.yml
Normal file
@ -0,0 +1,24 @@
|
||||
# Default override to use MySQL as a backing store for datahub-gms (same as docker-compose.mysql.yml).
|
||||
---
|
||||
version: '3.8'
|
||||
services:
|
||||
mysql:
|
||||
container_name: mysql
|
||||
hostname: mysql
|
||||
image: mysql:5.7
|
||||
env_file: mysql/env/docker.env
|
||||
restart: always
|
||||
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
|
||||
ports:
|
||||
- "3306:3306"
|
||||
volumes:
|
||||
- ./mysql/init.sql:/docker-entrypoint-initdb.d/init.sql
|
||||
- mysqldata:/var/lib/mysql
|
||||
|
||||
datahub-gms:
|
||||
env_file: datahub-gms/env/docker.env
|
||||
depends_on:
|
||||
- mysql
|
||||
|
||||
volumes:
|
||||
mysqldata:
|
||||
192
docker/docker-compose.yml
Normal file
192
docker/docker-compose.yml
Normal file
@ -0,0 +1,192 @@
|
||||
# Docker compose file covering DataHub's default configuration, which is to run all containers on a single host.
|
||||
|
||||
# Please see the README.md for instructions as to how to use and customize.
|
||||
|
||||
# NOTE: This file will cannot build! No dockerfiles are set. See the README.md in this directory.
|
||||
---
|
||||
version: '3.8'
|
||||
services:
|
||||
zookeeper:
|
||||
image: confluentinc/cp-zookeeper:5.4.0
|
||||
env_file: zookeeper/env/docker.env
|
||||
hostname: zookeeper
|
||||
container_name: zookeeper
|
||||
ports:
|
||||
- "2181:2181"
|
||||
volumes:
|
||||
- zkdata:/var/opt/zookeeper
|
||||
|
||||
broker:
|
||||
image: confluentinc/cp-kafka:5.4.0
|
||||
env_file: broker/env/docker.env
|
||||
hostname: broker
|
||||
container_name: broker
|
||||
depends_on:
|
||||
- zookeeper
|
||||
ports:
|
||||
- "29092:29092"
|
||||
- "9092:9092"
|
||||
|
||||
kafka-rest-proxy:
|
||||
image: confluentinc/cp-kafka-rest:5.4.0
|
||||
env_file: kafka-rest-proxy/env/docker.env
|
||||
hostname: kafka-rest-proxy
|
||||
container_name: kafka-rest-proxy
|
||||
ports:
|
||||
- "8082:8082"
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- broker
|
||||
- schema-registry
|
||||
|
||||
kafka-topics-ui:
|
||||
image: landoop/kafka-topics-ui:0.9.4
|
||||
env_file: kafka-topics-ui/env/docker.env
|
||||
hostname: kafka-topics-ui
|
||||
container_name: kafka-topics-ui
|
||||
ports:
|
||||
- "18000:8000"
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- broker
|
||||
- schema-registry
|
||||
- kafka-rest-proxy
|
||||
|
||||
# This "container" is a workaround to pre-create topics
|
||||
kafka-setup:
|
||||
build:
|
||||
context: kafka-setup
|
||||
env_file: kafka-setup/env/docker.env
|
||||
hostname: kafka-setup
|
||||
container_name: kafka-setup
|
||||
depends_on:
|
||||
- broker
|
||||
- schema-registry
|
||||
|
||||
schema-registry:
|
||||
image: confluentinc/cp-schema-registry:5.4.0
|
||||
env_file: schema-registry/env/docker.env
|
||||
hostname: schema-registry
|
||||
container_name: schema-registry
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- broker
|
||||
ports:
|
||||
- "8081:8081"
|
||||
|
||||
schema-registry-ui:
|
||||
image: landoop/schema-registry-ui:latest
|
||||
env_file: schema-registry-ui/env/docker.env
|
||||
container_name: schema-registry-ui
|
||||
hostname: schema-registry-ui
|
||||
ports:
|
||||
- "8000:8000"
|
||||
depends_on:
|
||||
- schema-registry
|
||||
|
||||
elasticsearch:
|
||||
image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8
|
||||
env_file: elasticsearch/env/docker.env
|
||||
container_name: elasticsearch
|
||||
hostname: elasticsearch
|
||||
ports:
|
||||
- "9200:9200"
|
||||
volumes:
|
||||
- esdata:/usr/share/elasticsearch/data
|
||||
|
||||
kibana:
|
||||
image: docker.elastic.co/kibana/kibana:5.6.8
|
||||
env_file: kibana/env/docker.env
|
||||
container_name: kibana
|
||||
hostname: kibana
|
||||
ports:
|
||||
- "5601:5601"
|
||||
depends_on:
|
||||
- elasticsearch
|
||||
|
||||
neo4j:
|
||||
image: neo4j:3.5.7
|
||||
env_file: neo4j/env/docker.env
|
||||
hostname: neo4j
|
||||
container_name: neo4j
|
||||
ports:
|
||||
- "7474:7474"
|
||||
- "7687:7687"
|
||||
volumes:
|
||||
- neo4jdata:/data
|
||||
|
||||
# This "container" is a workaround to pre-create search indices
|
||||
elasticsearch-setup:
|
||||
build:
|
||||
context: elasticsearch-setup
|
||||
env_file: elasticsearch-setup/env/docker.env
|
||||
hostname: elasticsearch-setup
|
||||
container_name: elasticsearch-setup
|
||||
depends_on:
|
||||
- elasticsearch
|
||||
|
||||
datahub-gms:
|
||||
build:
|
||||
context: ../
|
||||
dockerfile: docker/datahub-gms/Dockerfile
|
||||
image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest}
|
||||
hostname: datahub-gms
|
||||
container_name: datahub-gms
|
||||
ports:
|
||||
- "8080:8080"
|
||||
depends_on:
|
||||
- elasticsearch-setup
|
||||
- kafka-setup
|
||||
- mysql
|
||||
- neo4j
|
||||
|
||||
datahub-frontend:
|
||||
build:
|
||||
context: ../
|
||||
dockerfile: docker/datahub-frontend/Dockerfile
|
||||
image: linkedin/datahub-frontend:${DATAHUB_VERSION:-latest}
|
||||
env_file: datahub-frontend/env/docker.env
|
||||
hostname: datahub-frontend
|
||||
container_name: datahub-frontend
|
||||
ports:
|
||||
- "9001:9001"
|
||||
depends_on:
|
||||
- datahub-gms
|
||||
|
||||
datahub-mae-consumer:
|
||||
build:
|
||||
context: ../
|
||||
dockerfile: docker/datahub-mae-consumer/Dockerfile
|
||||
image: linkedin/datahub-mae-consumer:${DATAHUB_VERSION:-latest}
|
||||
env_file: datahub-mae-consumer/env/docker.env
|
||||
hostname: datahub-mae-consumer
|
||||
container_name: datahub-mae-consumer
|
||||
ports:
|
||||
- "9091:9091"
|
||||
depends_on:
|
||||
- kafka-setup
|
||||
- elasticsearch-setup
|
||||
- neo4j
|
||||
|
||||
datahub-mce-consumer:
|
||||
build:
|
||||
context: ../
|
||||
dockerfile: docker/datahub-mce-consumer/Dockerfile
|
||||
image: linkedin/datahub-mce-consumer:${DATAHUB_VERSION:-latest}
|
||||
env_file: datahub-mce-consumer/env/docker.env
|
||||
hostname: datahub-mce-consumer
|
||||
container_name: datahub-mce-consumer
|
||||
ports:
|
||||
- "9090:9090"
|
||||
depends_on:
|
||||
- kafka-setup
|
||||
- datahub-gms
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
|
||||
volumes:
|
||||
esdata:
|
||||
neo4jdata:
|
||||
zkdata:
|
||||
5
docker/elasticsearch-setup/README.md
Normal file
5
docker/elasticsearch-setup/README.md
Normal file
@ -0,0 +1,5 @@
|
||||
# Elasticsearch & Kibana
|
||||
|
||||
DataHub uses Elasticsearch as a search engine. Elasticsearch powers search, typeahead and browse functions for DataHub.
|
||||
[Official Elasticsearch Docker image](https://hub.docker.com/_/elasticsearch) found in Docker Hub is used without
|
||||
any modification.
|
||||
2
docker/elasticsearch-setup/env/docker.env
vendored
Normal file
2
docker/elasticsearch-setup/env/docker.env
vendored
Normal file
@ -0,0 +1,2 @@
|
||||
ELASTICSEARCH_HOST=elasticsearch
|
||||
ELASTICSEARCH_PORT=9200
|
||||
@ -1,35 +0,0 @@
|
||||
# Elasticsearch & Kibana
|
||||
|
||||
DataHub uses Elasticsearch as a search engine. Elasticsearch powers search, typeahead and browse functions for DataHub.
|
||||
[Official Elasticsearch Docker image](https://hub.docker.com/_/elasticsearch) found in Docker Hub is used without
|
||||
any modification.
|
||||
|
||||
## Run Docker container
|
||||
Below command will start the Elasticsearch and Kibana containers. `DataHub` uses Elasticsearch release `5.6.8`. Newer
|
||||
versions of Elasticsearch are not tested and you might experience compatibility issues.
|
||||
```
|
||||
cd docker/elasticsearch && docker-compose pull && docker-compose up --build
|
||||
```
|
||||
You can connect to Kibana on your web browser to monitor Elasticsearch via below link:
|
||||
```
|
||||
http://localhost:5601
|
||||
```
|
||||
|
||||
## Container configuration
|
||||
### External Port
|
||||
If you need to configure default configurations for your container such as the exposed port, you will do that in
|
||||
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
|
||||
how to change your exposed port settings.
|
||||
```
|
||||
ports:
|
||||
- "9200:9200"
|
||||
```
|
||||
|
||||
### Docker Network
|
||||
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
|
||||
If you change this, you will need to change this for all other Docker containers as well.
|
||||
```
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
```
|
||||
@ -1,38 +0,0 @@
|
||||
---
|
||||
version: '3.5'
|
||||
services:
|
||||
elasticsearch:
|
||||
image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8
|
||||
container_name: elasticsearch
|
||||
hostname: elasticsearch
|
||||
ports:
|
||||
- "9200:9200"
|
||||
environment:
|
||||
- discovery.type=single-node
|
||||
- xpack.security.enabled=false
|
||||
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
|
||||
|
||||
kibana:
|
||||
image: docker.elastic.co/kibana/kibana:5.6.8
|
||||
container_name: kibana
|
||||
hostname: kibana
|
||||
ports:
|
||||
- "5601:5601"
|
||||
depends_on:
|
||||
- elasticsearch
|
||||
|
||||
# This "container" is a workaround to pre-create search indices
|
||||
elasticsearch-setup:
|
||||
build:
|
||||
context: .
|
||||
hostname: elasticsearch-setup
|
||||
container_name: elasticsearch-setup
|
||||
depends_on:
|
||||
- elasticsearch
|
||||
environment:
|
||||
- ELASTICSEARCH_HOST=elasticsearch
|
||||
- ELASTICSEARCH_PORT=9200
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
3
docker/elasticsearch/env/docker.env
vendored
Normal file
3
docker/elasticsearch/env/docker.env
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
discovery.type=single-node
|
||||
xpack.security.enabled=false
|
||||
ES_JAVA_OPTS=-Xms1g -Xmx1g
|
||||
@ -1,50 +0,0 @@
|
||||
# DataHub Frontend Docker Image
|
||||
[](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-frontend+docker%22)
|
||||
|
||||
Refer to [DataHub Frontend Service](../../datahub-frontend) to have a quick understanding of the architecture and
|
||||
responsibility of this service for the DataHub.
|
||||
|
||||
## Build & Run
|
||||
```
|
||||
cd docker/frontend && docker-compose up --build
|
||||
```
|
||||
This command will rebuild the docker image and start a container based on the image.
|
||||
|
||||
To start a container using an existing image, run the same command without the `--build` flag.
|
||||
|
||||
### Container configuration
|
||||
#### External Port
|
||||
If you need to configure default configurations for your container such as the exposed port, you will do that in
|
||||
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
|
||||
how to change your exposed port settings.
|
||||
```
|
||||
ports:
|
||||
- "9001:9001"
|
||||
```
|
||||
|
||||
#### Docker Network
|
||||
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
|
||||
If you change this, you will need to change this for all other Docker containers as well.
|
||||
```
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
```
|
||||
|
||||
#### datahub-gms Container
|
||||
Before starting `datahub-frontend` container, `datahub-gms` container should already be up and running.
|
||||
`datahub-frontend` service creates a connection to `datahub-gms` service and this is configured with environment
|
||||
variables in `docker-compose.yml`:
|
||||
```
|
||||
environment:
|
||||
- DATAHUB_GMS_HOST=datahub-gms
|
||||
- DATAHUB_GMS_PORT=8080
|
||||
```
|
||||
The value of `DATAHUB_GMS_HOST` variable should be set to the host name of the `datahub-gms` container within the Docker network.
|
||||
|
||||
## Checking out DataHub UI
|
||||
After starting your Docker container, you can connect to it by typing below into your favorite web browser:
|
||||
```
|
||||
http://localhost:9001
|
||||
```
|
||||
You can sign in with `datahub` as username and password.
|
||||
@ -1,22 +0,0 @@
|
||||
---
|
||||
version: '3.5'
|
||||
services:
|
||||
datahub-frontend:
|
||||
image: linkedin/datahub-frontend:${DATAHUB_VERSION:-latest}
|
||||
build:
|
||||
context: ../../
|
||||
dockerfile: docker/frontend/Dockerfile
|
||||
hostname: datahub-frontend
|
||||
container_name: datahub-frontend
|
||||
ports:
|
||||
- "9001:9001"
|
||||
environment:
|
||||
- DATAHUB_GMS_HOST=datahub-gms
|
||||
- DATAHUB_GMS_PORT=8080
|
||||
- DATAHUB_SECRET=YouKnowNothing
|
||||
- DATAHUB_APP_VERSION=1.0
|
||||
- DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
@ -1,19 +0,0 @@
|
||||
FROM openjdk:8 as builder
|
||||
COPY . /datahub-src
|
||||
RUN cd /datahub-src && ./gradlew :gms:war:build \
|
||||
&& cp gms/war/build/libs/war.war /gms.war
|
||||
|
||||
|
||||
FROM openjdk:8-jre-alpine
|
||||
ENV DOCKERIZE_VERSION v0.6.1
|
||||
RUN apk --no-cache add curl tar \
|
||||
&& curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.20.v20190813/jetty-runner-9.4.20.v20190813.jar --output jetty-runner.jar \
|
||||
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
|
||||
|
||||
COPY --from=builder /gms.war .
|
||||
COPY docker/gms/start.sh /start.sh
|
||||
RUN chmod +x /start.sh
|
||||
|
||||
EXPOSE 8080
|
||||
|
||||
CMD /start.sh
|
||||
@ -1,82 +0,0 @@
|
||||
# DataHub Generalized Metadata Store (GMS) Docker Image
|
||||
[](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-gms+docker%22)
|
||||
|
||||
Refer to [DataHub GMS Service](../../gms) to have a quick understanding of the architecture and
|
||||
responsibility of this service for the DataHub.
|
||||
|
||||
|
||||
## Build & Run
|
||||
```
|
||||
cd docker/gms && docker-compose up --build
|
||||
```
|
||||
This command will rebuild the local docker image and start a container based on the image.
|
||||
|
||||
To start a container using an existing image, run the same command without the `--build` flag.
|
||||
|
||||
### Container configuration
|
||||
#### External Port
|
||||
If you need to configure default configurations for your container such as the exposed port, you will do that in
|
||||
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
|
||||
how to change your exposed port settings.
|
||||
```
|
||||
ports:
|
||||
- "8080:8080"
|
||||
```
|
||||
|
||||
#### Docker Network
|
||||
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
|
||||
If you change this, you will need to change this for all other Docker containers as well.
|
||||
```
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
```
|
||||
|
||||
#### MySQL, Elasticsearch and Kafka Containers
|
||||
Before starting `datahub-gms` container, `mysql`, `elasticsearch`, `neo4j` and `kafka` containers should already be up and running.
|
||||
These connections are configured via environment variables in `docker-compose.yml`:
|
||||
```
|
||||
environment:
|
||||
- EBEAN_DATASOURCE_USERNAME=datahub
|
||||
- EBEAN_DATASOURCE_PASSWORD=datahub
|
||||
- EBEAN_DATASOURCE_HOST=mysql:3306
|
||||
- EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub
|
||||
- EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
|
||||
```
|
||||
The value of `EBEAN_DATASOURCE_HOST` variable should be set to the host name of the `mysql` container within the Docker network.
|
||||
|
||||
```
|
||||
environment:
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
```
|
||||
The value of `KAFKA_BOOTSTRAP_SERVER` variable should be set to the host name of the `kafka broker` container within the Docker network.
|
||||
The value of `KAFKA_SCHEMAREGISTRY_URL` variable should be set to the host name of the `kafka schema registry` container within the Docker network.
|
||||
|
||||
```
|
||||
environment:
|
||||
- ELASTICSEARCH_HOST=elasticsearch
|
||||
- ELASTICSEARCH_PORT=9200
|
||||
```
|
||||
The value of `ELASTICSEARCH_HOST` variable should be set to the host name of the `elasticsearch` container within the Docker network.
|
||||
|
||||
```
|
||||
environment:
|
||||
- NEO4J_HOST=neo4j:7474
|
||||
- NEO4J_URI=bolt://neo4j
|
||||
- NEO4J_USERNAME=neo4j
|
||||
- NEO4J_PASSWORD=datahub
|
||||
```
|
||||
The value of `NEO4J_URI` variable should be set to the host name of the `neo4j` container within the Docker network.
|
||||
|
||||
## Other Database Platforms
|
||||
While GMS defaults to using MySQL as its storage backend, it is possible to switch to any of the
|
||||
[database platforms](https://ebean.io/docs/database/) supported by Ebean.
|
||||
For example, you can run the following command to start a GMS that connects to a PostgreSQL backend
|
||||
```
|
||||
cd docker/gms && docker-compose -f docker-compose-postgres.yml up --build
|
||||
```
|
||||
or a MariaDB backend
|
||||
```
|
||||
cd docker/gms && docker-compose -f docker-compose-mariadb.yml up --build
|
||||
```
|
||||
@ -1,30 +0,0 @@
|
||||
---
|
||||
version: '3.5'
|
||||
services:
|
||||
datahub-gms:
|
||||
image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest}
|
||||
build:
|
||||
context: ../../
|
||||
dockerfile: docker/gms/Dockerfile
|
||||
hostname: datahub-gms
|
||||
container_name: datahub-gms
|
||||
ports:
|
||||
- "8080:8080"
|
||||
environment:
|
||||
- EBEAN_DATASOURCE_USERNAME=datahub
|
||||
- EBEAN_DATASOURCE_PASSWORD=datahub
|
||||
- EBEAN_DATASOURCE_HOST=mariadb:3306
|
||||
- EBEAN_DATASOURCE_URL=jdbc:mariadb://mariadb:3306/datahub
|
||||
- EBEAN_DATASOURCE_DRIVER=org.mariadb.jdbc.Driver
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
- ELASTICSEARCH_HOST=elasticsearch
|
||||
- ELASTICSEARCH_PORT=9200
|
||||
- NEO4J_HOST=neo4j:7474
|
||||
- NEO4J_URI=bolt://neo4j
|
||||
- NEO4J_USERNAME=neo4j
|
||||
- NEO4J_PASSWORD=datahub
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
@ -1,30 +0,0 @@
|
||||
---
|
||||
version: '3.5'
|
||||
services:
|
||||
datahub-gms:
|
||||
image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest}
|
||||
build:
|
||||
context: ../../
|
||||
dockerfile: docker/gms/Dockerfile
|
||||
hostname: datahub-gms
|
||||
container_name: datahub-gms
|
||||
ports:
|
||||
- "8080:8080"
|
||||
environment:
|
||||
- EBEAN_DATASOURCE_USERNAME=datahub
|
||||
- EBEAN_DATASOURCE_PASSWORD=datahub
|
||||
- EBEAN_DATASOURCE_HOST=postgres:5432
|
||||
- EBEAN_DATASOURCE_URL=jdbc:postgresql://postgres:5432/datahub
|
||||
- EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
- ELASTICSEARCH_HOST=elasticsearch
|
||||
- ELASTICSEARCH_PORT=9200
|
||||
- NEO4J_HOST=neo4j:7474
|
||||
- NEO4J_URI=bolt://neo4j
|
||||
- NEO4J_USERNAME=neo4j
|
||||
- NEO4J_PASSWORD=datahub
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
@ -1,30 +0,0 @@
|
||||
---
|
||||
version: '3.5'
|
||||
services:
|
||||
datahub-gms:
|
||||
image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest}
|
||||
build:
|
||||
context: ../../
|
||||
dockerfile: docker/gms/Dockerfile
|
||||
hostname: datahub-gms
|
||||
container_name: datahub-gms
|
||||
ports:
|
||||
- "8080:8080"
|
||||
environment:
|
||||
- EBEAN_DATASOURCE_USERNAME=datahub
|
||||
- EBEAN_DATASOURCE_PASSWORD=datahub
|
||||
- EBEAN_DATASOURCE_HOST=mysql:3306
|
||||
- EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true
|
||||
- EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
- ELASTICSEARCH_HOST=elasticsearch
|
||||
- ELASTICSEARCH_PORT=9200
|
||||
- NEO4J_HOST=neo4j:7474
|
||||
- NEO4J_URI=bolt://neo4j
|
||||
- NEO4J_USERNAME=neo4j
|
||||
- NEO4J_PASSWORD=datahub
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
@ -2,16 +2,3 @@
|
||||
|
||||
Refer to [DataHub Metadata Ingestion](../../metadata-ingestion/mce-cli) to have a quick understanding of the architecture and
|
||||
responsibility of this service for the DataHub.
|
||||
|
||||
## Build & Run
|
||||
```
|
||||
cd docker/ingestion && docker-compose up --build
|
||||
```
|
||||
This command will rebuild the docker image and start a container based on the image.
|
||||
|
||||
To start a container using an existing image, run the same command without the `--build` flag.
|
||||
|
||||
### Container configuration
|
||||
|
||||
#### Prerequisite Containers
|
||||
Before starting `ingestion` container, `kafka`, `datahub-gms`, `mysql` and `datahub-mce-consumer` containers should already be up and running.
|
||||
4
docker/kafka-rest-proxy/env/docker.env
vendored
Normal file
4
docker/kafka-rest-proxy/env/docker.env
vendored
Normal file
@ -0,0 +1,4 @@
|
||||
KAFKA_REST_LISTENERS=http://0.0.0.0:8082/
|
||||
KAFKA_REST_SCHEMA_REGISTRY_URL=http://schema-registry:8081/
|
||||
KAFKA_REST_HOST_NAME=kafka-rest-proxy
|
||||
KAFKA_REST_BOOTSTRAP_SERVERS=PLAINTEXT://broker:29092
|
||||
14
docker/kafka-setup/README.md
Normal file
14
docker/kafka-setup/README.md
Normal file
@ -0,0 +1,14 @@
|
||||
# Kafka, Zookeeper and Schema Registry
|
||||
|
||||
DataHub uses Kafka as the pub-sub message queue in the backend.
|
||||
[Official Confluent Kafka Docker images](https://hub.docker.com/u/confluentinc) found in Docker Hub is used without
|
||||
any modification.
|
||||
|
||||
## Debugging Kafka
|
||||
You can install [kafkacat](https://github.com/edenhill/kafkacat) to consume and produce messaged to Kafka topics.
|
||||
For example, to consume messages on MetadataAuditEvent topic, you can run below command.
|
||||
```
|
||||
kafkacat -b localhost:9092 -t MetadataAuditEvent
|
||||
```
|
||||
However, `kafkacat` currently doesn't support Avro deserialization at this point,
|
||||
but they have an ongoing [work](https://github.com/edenhill/kafkacat/pull/151) for that.
|
||||
2
docker/kafka-setup/env/docker.env
vendored
Normal file
2
docker/kafka-setup/env/docker.env
vendored
Normal file
@ -0,0 +1,2 @@
|
||||
KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
|
||||
KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
2
docker/kafka-topics-ui/env/docker.env
vendored
Normal file
2
docker/kafka-topics-ui/env/docker.env
vendored
Normal file
@ -0,0 +1,2 @@
|
||||
KAFKA_REST_PROXY_URL="http://kafkarestproxy:8082/"
|
||||
PROXY="true"
|
||||
@ -1,47 +0,0 @@
|
||||
# Kafka, Zookeeper and Schema Registry
|
||||
|
||||
DataHub uses Kafka as the pub-sub message queue in the backend.
|
||||
[Official Confluent Kafka Docker images](https://hub.docker.com/u/confluentinc) found in Docker Hub is used without
|
||||
any modification.
|
||||
|
||||
## Run Docker container
|
||||
Below command will start all Kafka related containers.
|
||||
```
|
||||
cd docker/kafka && docker-compose pull && docker-compose up
|
||||
```
|
||||
As part of `docker-compose`, we also initialize a container called `kafka-setup` to create `MetadataAuditEvent` and
|
||||
`MetadataChangeEvent` & `FailedMetadataChangeEvent` topics. The only thing this container does is creating Kafka topics after Kafka broker is ready.
|
||||
|
||||
There is also a container which provides visual schema registry interface which you can register/unregister schemas.
|
||||
You can connect to `schema-registry-ui` on your web browser to monitor Kafka Schema Registry via below link:
|
||||
```
|
||||
http://localhost:8000
|
||||
```
|
||||
|
||||
## Container configuration
|
||||
### External Port
|
||||
If you need to configure default configurations for your container such as the exposed port, you will do that in
|
||||
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
|
||||
how to change your exposed port settings.
|
||||
```
|
||||
ports:
|
||||
- "9092:9092"
|
||||
```
|
||||
|
||||
### Docker Network
|
||||
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
|
||||
If you change this, you will need to change this for all other Docker containers as well.
|
||||
```
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
```
|
||||
|
||||
## Debugging Kafka
|
||||
You can install [kafkacat](https://github.com/edenhill/kafkacat) to consume and produce messaged to Kafka topics.
|
||||
For example, to consume messages on MetadataAuditEvent topic, you can run below command.
|
||||
```
|
||||
kafkacat -b localhost:9092 -t MetadataAuditEvent
|
||||
```
|
||||
However, `kafkacat` currently doesn't support Avro deserialization at this point,
|
||||
but they have an ongoing [work](https://github.com/edenhill/kafkacat/pull/151) for that.
|
||||
@ -1,104 +0,0 @@
|
||||
---
|
||||
version: '3.5'
|
||||
services:
|
||||
zookeeper:
|
||||
image: confluentinc/cp-zookeeper:5.4.0
|
||||
hostname: zookeeper
|
||||
container_name: zookeeper
|
||||
ports:
|
||||
- "2181:2181"
|
||||
environment:
|
||||
ZOOKEEPER_CLIENT_PORT: 2181
|
||||
ZOOKEEPER_TICK_TIME: 2000
|
||||
|
||||
broker:
|
||||
image: confluentinc/cp-kafka:5.4.0
|
||||
hostname: broker
|
||||
container_name: broker
|
||||
depends_on:
|
||||
- zookeeper
|
||||
ports:
|
||||
- "29092:29092"
|
||||
- "9092:9092"
|
||||
environment:
|
||||
KAFKA_BROKER_ID: 1
|
||||
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
|
||||
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
|
||||
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
|
||||
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
|
||||
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
|
||||
|
||||
# This "container" is a workaround to pre-create topics
|
||||
kafka-setup:
|
||||
build:
|
||||
context: .
|
||||
hostname: kafka-setup
|
||||
container_name: kafka-setup
|
||||
depends_on:
|
||||
- broker
|
||||
- schema-registry
|
||||
environment:
|
||||
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
|
||||
kafka-rest-proxy:
|
||||
image: confluentinc/cp-kafka-rest:5.4.0
|
||||
hostname: kafka-rest-proxy
|
||||
ports:
|
||||
- "8082:8082"
|
||||
environment:
|
||||
KAFKA_REST_LISTENERS: http://0.0.0.0:8082/
|
||||
KAFKA_REST_SCHEMA_REGISTRY_URL: http://schema-registry:8081/
|
||||
KAFKA_REST_HOST_NAME: kafka-rest-proxy
|
||||
KAFKA_REST_BOOTSTRAP_SERVERS: PLAINTEXT://broker:29092
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- broker
|
||||
- schema-registry
|
||||
|
||||
kafka-topics-ui:
|
||||
image: landoop/kafka-topics-ui:0.9.4
|
||||
hostname: kafka-topics-ui
|
||||
ports:
|
||||
- "18000:8000"
|
||||
environment:
|
||||
KAFKA_REST_PROXY_URL: "http://kafka-rest-proxy:8082/"
|
||||
PROXY: "true"
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- broker
|
||||
- schema-registry
|
||||
- kafka-rest-proxy
|
||||
|
||||
schema-registry:
|
||||
image: confluentinc/cp-schema-registry:5.4.0
|
||||
hostname: schema-registry
|
||||
container_name: schema-registry
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- broker
|
||||
ports:
|
||||
- "8081:8081"
|
||||
environment:
|
||||
SCHEMA_REGISTRY_HOST_NAME: schema-registry
|
||||
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181'
|
||||
|
||||
schema-registry-ui:
|
||||
image: landoop/schema-registry-ui:latest
|
||||
container_name: schema-registry-ui
|
||||
hostname: schema-registry-ui
|
||||
ports:
|
||||
- "8000:8000"
|
||||
environment:
|
||||
SCHEMAREGISTRY_URL: 'http://schema-registry:8081'
|
||||
ALLOW_GLOBAL: 'true'
|
||||
ALLOW_TRANSITIVE: 'true'
|
||||
ALLOW_DELETION: 'true'
|
||||
READONLY_MODE: 'true'
|
||||
PROXY: 'true'
|
||||
depends_on:
|
||||
- schema-registry
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
2
docker/kibana/env/docker.env
vendored
Normal file
2
docker/kibana/env/docker.env
vendored
Normal file
@ -0,0 +1,2 @@
|
||||
SERVER_HOST=0.0.0.0
|
||||
ELASTICSEARCH_URL=http://elasticsearch:9200
|
||||
@ -1,19 +0,0 @@
|
||||
FROM openjdk:8 as builder
|
||||
|
||||
COPY . datahub-src
|
||||
RUN cd datahub-src && ./gradlew :metadata-jobs:mae-consumer-job:build \
|
||||
&& cp metadata-jobs/mae-consumer-job/build/libs/mae-consumer-job.jar ../mae-consumer-job.jar \
|
||||
&& cd .. && rm -rf datahub-src
|
||||
|
||||
FROM openjdk:8-jre-alpine
|
||||
ENV DOCKERIZE_VERSION v0.6.1
|
||||
RUN apk --no-cache add curl tar \
|
||||
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
|
||||
|
||||
COPY --from=builder /mae-consumer-job.jar /mae-consumer-job.jar
|
||||
COPY docker/mae-consumer/start.sh /start.sh
|
||||
RUN chmod +x /start.sh
|
||||
|
||||
EXPOSE 9091
|
||||
|
||||
CMD /start.sh
|
||||
@ -1,42 +0,0 @@
|
||||
# DataHub MetadataAuditEvent (MAE) Consumer Docker Image
|
||||
[](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mae-consumer+docker%22)
|
||||
|
||||
Refer to [DataHub MAE Consumer Job](../../metadata-jobs/mae-consumer-job) to have a quick understanding of the architecture and
|
||||
responsibility of this service for the DataHub.
|
||||
|
||||
## Build & Run
|
||||
```
|
||||
cd docker/mae-consumer && docker-compose up --build
|
||||
```
|
||||
This command will rebuild the docker image and start a container based on the image.
|
||||
|
||||
To start a container using a previously built image, run the same command without the `--build` flag.
|
||||
|
||||
### Container configuration
|
||||
|
||||
#### Docker Network
|
||||
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
|
||||
If you change this, you will need to change this for all other Docker containers as well.
|
||||
```
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
```
|
||||
|
||||
#### Elasticsearch and Kafka Containers
|
||||
Before starting `datahub-mae-consumer` container, `elasticsearch` and `kafka` containers should already be up and running.
|
||||
These connections are configured via environment variables in `docker-compose.yml`:
|
||||
```
|
||||
environment:
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
```
|
||||
The value of `KAFKA_BOOTSTRAP_SERVER` variable should be set to the host name of the `kafka broker` container within the Docker network.
|
||||
The value of `KAFKA_SCHEMAREGISTRY_URL` variable should be set to the host name of the `kafka schema registry` container within the Docker network.
|
||||
|
||||
```
|
||||
environment:
|
||||
- ELASTICSEARCH_HOST=elasticsearch
|
||||
- ELASTICSEARCH_PORT=9200
|
||||
```
|
||||
The value of `ELASTICSEARCH_HOST` variable should be set to the host name of the `elasticsearch` container within the Docker network.
|
||||
@ -1,25 +0,0 @@
|
||||
---
|
||||
version: '3.5'
|
||||
services:
|
||||
datahub-mae-consumer:
|
||||
image: linkedin/datahub-mae-consumer:${DATAHUB_VERSION:-latest}
|
||||
build:
|
||||
context: ../../
|
||||
dockerfile: docker/mae-consumer/Dockerfile
|
||||
hostname: datahub-mae-consumer
|
||||
container_name: datahub-mae-consumer
|
||||
ports:
|
||||
- "9091:9091"
|
||||
environment:
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
- ELASTICSEARCH_HOST=elasticsearch
|
||||
- ELASTICSEARCH_PORT=9200
|
||||
- NEO4J_HOST=neo4j:7474
|
||||
- NEO4J_URI=bolt://neo4j
|
||||
- NEO4J_USERNAME=neo4j
|
||||
- NEO4J_PASSWORD=datahub
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
@ -4,36 +4,3 @@ DataHub GMS can use MariaDB as an alternate storage backend.
|
||||
|
||||
[Official MariaDB Docker image](https://hub.docker.com/_/mariadb) found in Docker Hub is used without
|
||||
any modification.
|
||||
|
||||
## Run Docker container
|
||||
Below command will start the MariaDB container.
|
||||
```
|
||||
cd docker/mariadb && docker-compose pull && docker-compose up
|
||||
```
|
||||
|
||||
An initialization script [init.sql](init.sql) is provided to container. This script initializes `metadata-aspect` table
|
||||
which is basically the Key-Value store of the DataHub GMS.
|
||||
|
||||
To connect to MariaDB container, you can type below command:
|
||||
```
|
||||
docker exec -it mariadb mysql -u datahub -pdatahub datahub
|
||||
```
|
||||
|
||||
## Container configuration
|
||||
### External Port
|
||||
If you need to configure default configurations for your container such as the exposed port, you will do that in
|
||||
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
|
||||
how to change your exposed port settings.
|
||||
```
|
||||
ports:
|
||||
- '3306:3306'
|
||||
```
|
||||
|
||||
### Docker Network
|
||||
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
|
||||
If you change this, you will need to change this for all other Docker containers as well.
|
||||
```
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
```
|
||||
@ -1,21 +1,23 @@
|
||||
# Override to use MariaDB as a backing store for datahub-gms.
|
||||
---
|
||||
version: '3.5'
|
||||
version: '3.8'
|
||||
services:
|
||||
mysql:
|
||||
mariadb:
|
||||
container_name: mariadb
|
||||
hostname: mariadb
|
||||
image: mariadb:10.5
|
||||
env_file: env/docker.env
|
||||
restart: always
|
||||
environment:
|
||||
MYSQL_DATABASE: 'datahub'
|
||||
MYSQL_USER: 'datahub'
|
||||
MYSQL_PASSWORD: 'datahub'
|
||||
MYSQL_ROOT_PASSWORD: 'datahub'
|
||||
ports:
|
||||
- '3306:3306'
|
||||
volumes:
|
||||
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
|
||||
|
||||
datahub-gms:
|
||||
env_file: ../datahub-gms/env/dev.mariadb.env
|
||||
depends_on:
|
||||
- mariadb
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
@ -1,19 +0,0 @@
|
||||
FROM openjdk:8 as builder
|
||||
|
||||
COPY . datahub-src
|
||||
RUN cd datahub-src && ./gradlew :metadata-jobs:mce-consumer-job:build \
|
||||
&& cp metadata-jobs/mce-consumer-job/build/libs/mce-consumer-job.jar ../mce-consumer-job.jar \
|
||||
&& cd .. && rm -rf datahub-src
|
||||
|
||||
FROM openjdk:8-jre-alpine
|
||||
ENV DOCKERIZE_VERSION v0.6.1
|
||||
RUN apk --no-cache add curl tar \
|
||||
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
|
||||
|
||||
COPY --from=builder /mce-consumer-job.jar /mce-consumer-job.jar
|
||||
COPY docker/mce-consumer/start.sh /start.sh
|
||||
RUN chmod +x /start.sh
|
||||
|
||||
EXPOSE 9090
|
||||
|
||||
CMD /start.sh
|
||||
@ -1,42 +0,0 @@
|
||||
# DataHub MetadataChangeEvent (MCE) Consumer Docker Image
|
||||
[](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mce-consumer+docker%22)
|
||||
|
||||
Refer to [DataHub MCE Consumer Job](../../metadata-jobs/mce-consumer-job) to have a quick understanding of the architecture and
|
||||
responsibility of this service for the DataHub.
|
||||
|
||||
## Build & Run
|
||||
```
|
||||
cd docker/mce-consumer && docker-compose up --build
|
||||
```
|
||||
This command will rebuild the docker image and start a container based on the image.
|
||||
|
||||
To start a container using a previously built image, run the same command without the `--build` flag.
|
||||
|
||||
### Container configuration
|
||||
|
||||
#### Docker Network
|
||||
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
|
||||
If you change this, you will need to change this for all other Docker containers as well.
|
||||
```
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
```
|
||||
|
||||
#### Kafka and DataHub GMS Containers
|
||||
Before starting `datahub-mce-consumer` container, `datahub-gms` and `kafka` containers should already be up and running.
|
||||
These connections are configured via environment variables in `docker-compose.yml`:
|
||||
```
|
||||
environment:
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
```
|
||||
The value of `KAFKA_BOOTSTRAP_SERVER` variable should be set to the host name of the `kafka broker` container within the Docker network.
|
||||
The value of `KAFKA_SCHEMAREGISTRY_URL` variable should be set to the host name of the `kafka schema registry` container within the Docker network.
|
||||
|
||||
```
|
||||
environment:
|
||||
- GMS_HOST=datahub-gms
|
||||
- GMS_PORT=8080
|
||||
```
|
||||
The value of `GMS_HOST` variable should be set to the host name of the `datahub-gms` container within the Docker network.
|
||||
@ -1,23 +0,0 @@
|
||||
---
|
||||
version: '3.5'
|
||||
services:
|
||||
datahub-mce-consumer:
|
||||
image: linkedin/datahub-mce-consumer:${DATAHUB_VERSION:-latest}
|
||||
build:
|
||||
context: ../../
|
||||
dockerfile: docker/mce-consumer/Dockerfile
|
||||
hostname: datahub-mce-consumer
|
||||
container_name: datahub-mce-consumer
|
||||
ports:
|
||||
- "9090:9090"
|
||||
environment:
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
- GMS_HOST=datahub-gms
|
||||
- GMS_PORT=8080
|
||||
- KAFKA_MCE_TOPIC_NAME=MetadataChangeEvent
|
||||
- KAFKA_FMCE_TOPIC_NAME=FailedMetadataChangeEvent
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
@ -4,36 +4,3 @@ DataHub GMS uses MySQL as the storage backend.
|
||||
|
||||
[Official MySQL Docker image](https://hub.docker.com/_/mysql) found in Docker Hub is used without
|
||||
any modification.
|
||||
|
||||
## Run Docker container
|
||||
Below command will start the MySQL container.
|
||||
```
|
||||
cd docker/mysql && docker-compose pull && docker-compose up
|
||||
```
|
||||
|
||||
An initialization script [init.sql](init.sql) is provided to container. This script initializes `metadata-aspect` table
|
||||
which is basically the Key-Value store of the DataHub GMS.
|
||||
|
||||
To connect to MySQL container, you can type below command:
|
||||
```
|
||||
docker exec -it mysql mysql -u datahub -pdatahub datahub
|
||||
```
|
||||
|
||||
## Container configuration
|
||||
### External Port
|
||||
If you need to configure default configurations for your container such as the exposed port, you will do that in
|
||||
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
|
||||
how to change your exposed port settings.
|
||||
```
|
||||
ports:
|
||||
- '3306:3306'
|
||||
```
|
||||
|
||||
### Docker Network
|
||||
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
|
||||
If you change this, you will need to change this for all other Docker containers as well.
|
||||
```
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
```
|
||||
24
docker/mysql/docker-compose.mysql.yml
Normal file
24
docker/mysql/docker-compose.mysql.yml
Normal file
@ -0,0 +1,24 @@
|
||||
# Override to use MySQL as a backing store for datahub-gms.
|
||||
---
|
||||
version: '3.8'
|
||||
services:
|
||||
mysql:
|
||||
container_name: mysql
|
||||
hostname: mysql
|
||||
image: mysql:5.7
|
||||
env_file: env/docker.env
|
||||
restart: always
|
||||
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
|
||||
ports:
|
||||
- "3306:3306"
|
||||
volumes:
|
||||
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
|
||||
- mysqldata:/var/lib/mysql
|
||||
|
||||
datahub-gms:
|
||||
env_file: ../datahub-gms/env/docker.env
|
||||
depends_on:
|
||||
- mysql
|
||||
|
||||
volumes:
|
||||
mysqldata:
|
||||
@ -1,21 +0,0 @@
|
||||
---
|
||||
version: '3.5'
|
||||
services:
|
||||
mysql:
|
||||
container_name: mysql
|
||||
hostname: mysql
|
||||
image: mysql:5.7
|
||||
restart: always
|
||||
environment:
|
||||
MYSQL_DATABASE: 'datahub'
|
||||
MYSQL_USER: 'datahub'
|
||||
MYSQL_PASSWORD: 'datahub'
|
||||
MYSQL_ROOT_PASSWORD: 'datahub'
|
||||
ports:
|
||||
- '3306:3306'
|
||||
volumes:
|
||||
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
4
docker/mysql/env/docker.env
vendored
Normal file
4
docker/mysql/env/docker.env
vendored
Normal file
@ -0,0 +1,4 @@
|
||||
MYSQL_DATABASE=datahub
|
||||
MYSQL_USER=datahub
|
||||
MYSQL_PASSWORD=datahub
|
||||
MYSQL_ROOT_PASSWORD=datahub
|
||||
@ -4,32 +4,6 @@ DataHub uses Neo4j as graph db in the backend to serve graph queries.
|
||||
[Official Neo4j image](https://hub.docker.com/_/neo4j) found in Docker Hub is used without
|
||||
any modification.
|
||||
|
||||
## Run Docker container
|
||||
Below command will start all Neo4j container.
|
||||
```
|
||||
cd docker/neo4j && docker-compose pull && docker-compose up
|
||||
```
|
||||
|
||||
## Container configuration
|
||||
### External Port
|
||||
If you need to configure default configurations for your container such as the exposed port, you will do that in
|
||||
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
|
||||
how to change your exposed port settings.
|
||||
```
|
||||
ports:
|
||||
- "7474:7474"
|
||||
- "7687:7687"
|
||||
```
|
||||
|
||||
### Docker Network
|
||||
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
|
||||
If you change this, you will need to change it for all other Docker containers as well.
|
||||
```
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
```
|
||||
|
||||
## Neo4j Browser
|
||||
To be able to debug and run Cypher queries against your Neo4j image, you can open up `Neo4j Browser` which is running at
|
||||
[http://localhost:7474/browser/](http://localhost:7474/browser/). Default username is `neo4j` and password is `datahub`.
|
||||
@ -1,16 +0,0 @@
|
||||
---
|
||||
version: '3.5'
|
||||
services:
|
||||
neo4j:
|
||||
image: neo4j:3.5.7
|
||||
hostname: neo4j
|
||||
container_name: neo4j
|
||||
environment:
|
||||
NEO4J_AUTH: 'neo4j/datahub'
|
||||
ports:
|
||||
- "7474:7474"
|
||||
- "7687:7687"
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
1
docker/neo4j/env/docker.env
vendored
Normal file
1
docker/neo4j/env/docker.env
vendored
Normal file
@ -0,0 +1 @@
|
||||
NEO4J_AUTH=neo4j/datahub
|
||||
6
docker/postgres/README.md
Normal file
6
docker/postgres/README.md
Normal file
@ -0,0 +1,6 @@
|
||||
# MySQL
|
||||
|
||||
DataHub GMS can use PostgreSQL as an alternate storage backend.
|
||||
|
||||
[Official PostgreSQL Docker image](https://hub.docker.com/_/postgres) found in Docker Hub is used without
|
||||
any modification.
|
||||
@ -1,19 +1,23 @@
|
||||
# Override to use PostgreSQL as a backing store for datahub-gms.
|
||||
---
|
||||
version: '3.5'
|
||||
version: '3.8'
|
||||
services:
|
||||
postgres:
|
||||
container_name: postgres
|
||||
hostname: postgres
|
||||
image: postgres:12.3
|
||||
env_file: env/docker.env
|
||||
restart: always
|
||||
environment:
|
||||
POSTGRES_USER: datahub
|
||||
POSTGRES_PASSWORD: datahub
|
||||
ports:
|
||||
- '5432:5432'
|
||||
volumes:
|
||||
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
|
||||
|
||||
datahub-gms:
|
||||
env_file: ../datahub-gms/env/dev.postgres.env
|
||||
depends_on:
|
||||
- postgres
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
2
docker/postgres/env/docker.env
vendored
Normal file
2
docker/postgres/env/docker.env
vendored
Normal file
@ -0,0 +1,2 @@
|
||||
POSTGRES_USER: datahub
|
||||
POSTGRES_PASSWORD: datahub
|
||||
@ -1,39 +0,0 @@
|
||||
# MySQL
|
||||
|
||||
DataHub GMS can use PostgreSQL as an alternate storage backend.
|
||||
|
||||
[Official PostgreSQL Docker image](https://hub.docker.com/_/postgres) found in Docker Hub is used without
|
||||
any modification.
|
||||
|
||||
## Run Docker container
|
||||
Below command will start the MySQL container.
|
||||
```
|
||||
cd docker/postgres && docker-compose pull && docker-compose up
|
||||
```
|
||||
|
||||
An initialization script [init.sql](init.sql) is provided to container. This script initializes `metadata-aspect` table
|
||||
which is basically the Key-Value store of the DataHub GMS.
|
||||
|
||||
To connect to PostgreSQL container, you can type below command:
|
||||
```
|
||||
docker exec -it postgres psql -U datahub
|
||||
```
|
||||
|
||||
## Container configuration
|
||||
### External Port
|
||||
If you need to configure default configurations for your container such as the exposed port, you will do that in
|
||||
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
|
||||
how to change your exposed port settings.
|
||||
```
|
||||
ports:
|
||||
- '5432:5432'
|
||||
```
|
||||
|
||||
### Docker Network
|
||||
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
|
||||
If you change this, you will need to change this for all other Docker containers as well.
|
||||
```
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
```
|
||||
7
docker/quickstart.sh
Executable file
7
docker/quickstart.sh
Executable file
@ -0,0 +1,7 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Quickstarts DataHub by pullinng all images from dockerhub and then running the containers locally. No images are
|
||||
# built locally. Note: by default this pulls the latest version; you can change this to a specific version by setting
|
||||
# the DATAHUB_VERSION environment variable.
|
||||
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
|
||||
cd $DIR && docker-compose pull && docker-compose -p datahub up
|
||||
@ -1,30 +0,0 @@
|
||||
# DataHub Quickstart
|
||||
To start all Docker containers at once, please run below command from project root directory:
|
||||
```bash
|
||||
./docker/quickstart/quickstart.sh
|
||||
```
|
||||
|
||||
At this point, all containers are ready and DataHub can be considered up and running. Check specific containers guide
|
||||
for details:
|
||||
* [Elasticsearch & Kibana](../elasticsearch)
|
||||
* [DataHub Frontend](../frontend)
|
||||
* [DataHub GMS](../gms)
|
||||
* [Kafka, Schema Registry & Zookeeper](../kafka)
|
||||
* [DataHub MAE Consumer](../mae-consumer)
|
||||
* [DataHub MCE Consumer](../mce-consumer)
|
||||
* [MySQL](../mysql)
|
||||
|
||||
From this point on, if you want to be able to sign in to DataHub and see some sample data, please see
|
||||
[Metadata Ingestion Guide](../../metadata-ingestion) for `bootstrapping DataHub`.
|
||||
|
||||
You can also choose to use a specific versin of DataHub docker images instead of the `latest` by specifying `DATAHUB_VERSION` environment variable.
|
||||
|
||||
## Debugging Containers
|
||||
If you want to debug containers, you can check container logs:
|
||||
```
|
||||
docker logs <<container_name>>
|
||||
```
|
||||
Also, you can connect to container shell for further debugging:
|
||||
```
|
||||
docker exec -it <<container_name>> bash
|
||||
```
|
||||
@ -1,260 +0,0 @@
|
||||
---
|
||||
version: '3.5'
|
||||
services:
|
||||
mysql:
|
||||
container_name: mysql
|
||||
hostname: mysql
|
||||
image: mysql:5.7
|
||||
restart: always
|
||||
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
|
||||
environment:
|
||||
MYSQL_DATABASE: 'datahub'
|
||||
MYSQL_USER: 'datahub'
|
||||
MYSQL_PASSWORD: 'datahub'
|
||||
MYSQL_ROOT_PASSWORD: 'datahub'
|
||||
ports:
|
||||
- "3306:3306"
|
||||
volumes:
|
||||
- ../mysql/init.sql:/docker-entrypoint-initdb.d/init.sql
|
||||
- mysqldata:/var/lib/mysql
|
||||
|
||||
zookeeper:
|
||||
image: confluentinc/cp-zookeeper:5.4.0
|
||||
hostname: zookeeper
|
||||
container_name: zookeeper
|
||||
ports:
|
||||
- "2181:2181"
|
||||
environment:
|
||||
ZOOKEEPER_CLIENT_PORT: 2181
|
||||
ZOOKEEPER_TICK_TIME: 2000
|
||||
volumes:
|
||||
- zkdata:/var/opt/zookeeper
|
||||
|
||||
broker:
|
||||
image: confluentinc/cp-kafka:5.4.0
|
||||
hostname: broker
|
||||
container_name: broker
|
||||
depends_on:
|
||||
- zookeeper
|
||||
ports:
|
||||
- "29092:29092"
|
||||
- "9092:9092"
|
||||
environment:
|
||||
KAFKA_BROKER_ID: 1
|
||||
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
|
||||
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
|
||||
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
|
||||
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
|
||||
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
|
||||
|
||||
kafka-rest-proxy:
|
||||
image: confluentinc/cp-kafka-rest:5.4.0
|
||||
hostname: kafka-rest-proxy
|
||||
container_name: kafka-rest-proxy
|
||||
ports:
|
||||
- "8082:8082"
|
||||
environment:
|
||||
KAFKA_REST_LISTENERS: http://0.0.0.0:8082/
|
||||
KAFKA_REST_SCHEMA_REGISTRY_URL: http://schema-registry:8081/
|
||||
KAFKA_REST_HOST_NAME: kafka-rest-proxy
|
||||
KAFKA_REST_BOOTSTRAP_SERVERS: PLAINTEXT://broker:29092
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- broker
|
||||
- schema-registry
|
||||
|
||||
kafka-topics-ui:
|
||||
image: landoop/kafka-topics-ui:0.9.4
|
||||
hostname: kafka-topics-ui
|
||||
container_name: kafka-topics-ui
|
||||
ports:
|
||||
- "18000:8000"
|
||||
environment:
|
||||
KAFKA_REST_PROXY_URL: "http://kafka-rest-proxy:8082/"
|
||||
PROXY: "true"
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- broker
|
||||
- schema-registry
|
||||
- kafka-rest-proxy
|
||||
|
||||
# This "container" is a workaround to pre-create topics
|
||||
kafka-setup:
|
||||
build:
|
||||
context: ../kafka
|
||||
hostname: kafka-setup
|
||||
container_name: kafka-setup
|
||||
depends_on:
|
||||
- broker
|
||||
- schema-registry
|
||||
environment:
|
||||
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
|
||||
schema-registry:
|
||||
image: confluentinc/cp-schema-registry:5.4.0
|
||||
hostname: schema-registry
|
||||
container_name: schema-registry
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- broker
|
||||
ports:
|
||||
- "8081:8081"
|
||||
environment:
|
||||
SCHEMA_REGISTRY_HOST_NAME: schema-registry
|
||||
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181'
|
||||
|
||||
schema-registry-ui:
|
||||
image: landoop/schema-registry-ui:latest
|
||||
container_name: schema-registry-ui
|
||||
hostname: schema-registry-ui
|
||||
ports:
|
||||
- "8000:8000"
|
||||
environment:
|
||||
SCHEMAREGISTRY_URL: 'http://schema-registry:8081'
|
||||
ALLOW_GLOBAL: 'true'
|
||||
ALLOW_TRANSITIVE: 'true'
|
||||
ALLOW_DELETION: 'true'
|
||||
READONLY_MODE: 'true'
|
||||
PROXY: 'true'
|
||||
depends_on:
|
||||
- schema-registry
|
||||
|
||||
elasticsearch:
|
||||
image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8
|
||||
container_name: elasticsearch
|
||||
hostname: elasticsearch
|
||||
ports:
|
||||
- "9200:9200"
|
||||
environment:
|
||||
- discovery.type=single-node
|
||||
- xpack.security.enabled=false
|
||||
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
|
||||
volumes:
|
||||
- esdata:/usr/share/elasticsearch/data
|
||||
|
||||
kibana:
|
||||
image: docker.elastic.co/kibana/kibana:5.6.8
|
||||
container_name: kibana
|
||||
hostname: kibana
|
||||
ports:
|
||||
- "5601:5601"
|
||||
environment:
|
||||
- SERVER_HOST=0.0.0.0
|
||||
- ELASTICSEARCH_URL=http://elasticsearch:9200
|
||||
depends_on:
|
||||
- elasticsearch
|
||||
|
||||
neo4j:
|
||||
image: neo4j:3.5.7
|
||||
hostname: neo4j
|
||||
container_name: neo4j
|
||||
environment:
|
||||
NEO4J_AUTH: 'neo4j/datahub'
|
||||
ports:
|
||||
- "7474:7474"
|
||||
- "7687:7687"
|
||||
volumes:
|
||||
- neo4jdata:/data
|
||||
|
||||
# This "container" is a workaround to pre-create search indices
|
||||
elasticsearch-setup:
|
||||
build:
|
||||
context: ../elasticsearch
|
||||
hostname: elasticsearch-setup
|
||||
container_name: elasticsearch-setup
|
||||
depends_on:
|
||||
- elasticsearch
|
||||
environment:
|
||||
- ELASTICSEARCH_HOST=elasticsearch
|
||||
- ELASTICSEARCH_PORT=9200
|
||||
|
||||
datahub-gms:
|
||||
image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest}
|
||||
hostname: datahub-gms
|
||||
container_name: datahub-gms
|
||||
ports:
|
||||
- "8080:8080"
|
||||
environment:
|
||||
- EBEAN_DATASOURCE_USERNAME=datahub
|
||||
- EBEAN_DATASOURCE_PASSWORD=datahub
|
||||
- EBEAN_DATASOURCE_HOST=mysql:3306
|
||||
- EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8
|
||||
- EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
- ELASTICSEARCH_HOST=elasticsearch
|
||||
- ELASTICSEARCH_PORT=9200
|
||||
- NEO4J_HOST=neo4j:7474
|
||||
- NEO4J_URI=bolt://neo4j
|
||||
- NEO4J_USERNAME=neo4j
|
||||
- NEO4J_PASSWORD=datahub
|
||||
depends_on:
|
||||
- elasticsearch-setup
|
||||
- kafka-setup
|
||||
- mysql
|
||||
- neo4j
|
||||
|
||||
datahub-frontend:
|
||||
image: linkedin/datahub-frontend:${DATAHUB_VERSION:-latest}
|
||||
hostname: datahub-frontend
|
||||
container_name: datahub-frontend
|
||||
ports:
|
||||
- "9001:9001"
|
||||
environment:
|
||||
- DATAHUB_GMS_HOST=datahub-gms
|
||||
- DATAHUB_GMS_PORT=8080
|
||||
- DATAHUB_SECRET=YouKnowNothing
|
||||
- DATAHUB_APP_VERSION=1.0
|
||||
- DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB
|
||||
depends_on:
|
||||
- datahub-gms
|
||||
|
||||
datahub-mae-consumer:
|
||||
image: linkedin/datahub-mae-consumer:${DATAHUB_VERSION:-latest}
|
||||
hostname: datahub-mae-consumer
|
||||
container_name: datahub-mae-consumer
|
||||
ports:
|
||||
- "9091:9091"
|
||||
environment:
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
- ELASTICSEARCH_HOST=elasticsearch
|
||||
- ELASTICSEARCH_PORT=9200
|
||||
- NEO4J_HOST=neo4j:7474
|
||||
- NEO4J_URI=bolt://neo4j
|
||||
- NEO4J_USERNAME=neo4j
|
||||
- NEO4J_PASSWORD=datahub
|
||||
depends_on:
|
||||
- kafka-setup
|
||||
- elasticsearch-setup
|
||||
- neo4j
|
||||
command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \
|
||||
echo kafka-setup done! && /start.sh'"
|
||||
|
||||
datahub-mce-consumer:
|
||||
image: linkedin/datahub-mce-consumer:${DATAHUB_VERSION:-latest}
|
||||
hostname: datahub-mce-consumer
|
||||
container_name: datahub-mce-consumer
|
||||
ports:
|
||||
- "9090:9090"
|
||||
environment:
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
- GMS_HOST=datahub-gms
|
||||
- GMS_PORT=8080
|
||||
depends_on:
|
||||
- kafka-setup
|
||||
- datahub-gms
|
||||
command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \
|
||||
echo kafka-setup done! && /start.sh'"
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
|
||||
volumes:
|
||||
mysqldata:
|
||||
esdata:
|
||||
neo4jdata:
|
||||
zkdata:
|
||||
@ -1,4 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
|
||||
cd $DIR && docker-compose pull && docker-compose -p datahub up --build
|
||||
@ -1,268 +0,0 @@
|
||||
---
|
||||
version: '3.5'
|
||||
services:
|
||||
mysql:
|
||||
container_name: mysql
|
||||
hostname: mysql
|
||||
image: mysql:5.7
|
||||
restart: always
|
||||
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
|
||||
environment:
|
||||
MYSQL_DATABASE: 'datahub'
|
||||
MYSQL_USER: 'datahub'
|
||||
MYSQL_PASSWORD: 'datahub'
|
||||
MYSQL_ROOT_PASSWORD: 'datahub'
|
||||
ports:
|
||||
- "3306:3306"
|
||||
volumes:
|
||||
- ../mysql/init.sql:/docker-entrypoint-initdb.d/init.sql
|
||||
- mysqldata:/var/lib/mysql
|
||||
|
||||
zookeeper:
|
||||
image: confluentinc/cp-zookeeper:5.4.0
|
||||
hostname: zookeeper
|
||||
container_name: zookeeper
|
||||
ports:
|
||||
- "2181:2181"
|
||||
environment:
|
||||
ZOOKEEPER_CLIENT_PORT: 2181
|
||||
ZOOKEEPER_TICK_TIME: 2000
|
||||
volumes:
|
||||
- zkdata:/var/opt/zookeeper
|
||||
|
||||
broker:
|
||||
image: confluentinc/cp-kafka:5.4.0
|
||||
hostname: broker
|
||||
container_name: broker
|
||||
depends_on:
|
||||
- zookeeper
|
||||
ports:
|
||||
- "29092:29092"
|
||||
- "9092:9092"
|
||||
environment:
|
||||
KAFKA_BROKER_ID: 1
|
||||
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
|
||||
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
|
||||
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
|
||||
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
|
||||
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
|
||||
|
||||
kafka-rest-proxy:
|
||||
image: confluentinc/cp-kafka-rest:5.4.0
|
||||
hostname: kafka-rest-proxy
|
||||
container_name: kafka-rest-proxy
|
||||
ports:
|
||||
- "8082:8082"
|
||||
environment:
|
||||
KAFKA_REST_LISTENERS: http://0.0.0.0:8082/
|
||||
KAFKA_REST_SCHEMA_REGISTRY_URL: http://schema-registry:8081/
|
||||
KAFKA_REST_HOST_NAME: kafka-rest-proxy
|
||||
KAFKA_REST_BOOTSTRAP_SERVERS: PLAINTEXT://broker:29092
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- broker
|
||||
- schema-registry
|
||||
|
||||
kafka-topics-ui:
|
||||
image: landoop/kafka-topics-ui:0.9.4
|
||||
hostname: kafka-topics-ui
|
||||
container_name: kafka-topics-ui
|
||||
ports:
|
||||
- "18000:8000"
|
||||
environment:
|
||||
KAFKA_REST_PROXY_URL: "http://kafka-rest-proxy:8082/"
|
||||
PROXY: "true"
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- broker
|
||||
- schema-registry
|
||||
- kafka-rest-proxy
|
||||
|
||||
# This "container" is a workaround to pre-create topics
|
||||
kafka-setup:
|
||||
build:
|
||||
context: ../kafka
|
||||
hostname: kafka-setup
|
||||
container_name: kafka-setup
|
||||
depends_on:
|
||||
- broker
|
||||
- schema-registry
|
||||
environment:
|
||||
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
|
||||
schema-registry:
|
||||
image: confluentinc/cp-schema-registry:5.4.0
|
||||
hostname: schema-registry
|
||||
container_name: schema-registry
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- broker
|
||||
ports:
|
||||
- "8081:8081"
|
||||
environment:
|
||||
SCHEMA_REGISTRY_HOST_NAME: schema-registry
|
||||
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181'
|
||||
|
||||
schema-registry-ui:
|
||||
image: landoop/schema-registry-ui:latest
|
||||
container_name: schema-registry-ui
|
||||
hostname: schema-registry-ui
|
||||
ports:
|
||||
- "8000:8000"
|
||||
environment:
|
||||
SCHEMAREGISTRY_URL: 'http://schema-registry:8081'
|
||||
ALLOW_GLOBAL: 'true'
|
||||
ALLOW_TRANSITIVE: 'true'
|
||||
ALLOW_DELETION: 'true'
|
||||
READONLY_MODE: 'true'
|
||||
PROXY: 'true'
|
||||
depends_on:
|
||||
- schema-registry
|
||||
|
||||
elasticsearch:
|
||||
image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8
|
||||
container_name: elasticsearch
|
||||
hostname: elasticsearch
|
||||
ports:
|
||||
- "9200:9200"
|
||||
environment:
|
||||
- discovery.type=single-node
|
||||
- xpack.security.enabled=false
|
||||
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
|
||||
volumes:
|
||||
- esdata:/usr/share/elasticsearch/data
|
||||
|
||||
kibana:
|
||||
image: docker.elastic.co/kibana/kibana:5.6.8
|
||||
container_name: kibana
|
||||
hostname: kibana
|
||||
ports:
|
||||
- "5601:5601"
|
||||
environment:
|
||||
- SERVER_HOST=0.0.0.0
|
||||
- ELASTICSEARCH_URL=http://elasticsearch:9200
|
||||
depends_on:
|
||||
- elasticsearch
|
||||
|
||||
neo4j:
|
||||
image: neo4j:3.5.7
|
||||
hostname: neo4j
|
||||
container_name: neo4j
|
||||
environment:
|
||||
NEO4J_AUTH: 'neo4j/datahub'
|
||||
ports:
|
||||
- "7474:7474"
|
||||
- "7687:7687"
|
||||
volumes:
|
||||
- neo4jdata:/data
|
||||
|
||||
# This "container" is a workaround to pre-create search indices
|
||||
elasticsearch-setup:
|
||||
build:
|
||||
context: ../elasticsearch
|
||||
hostname: elasticsearch-setup
|
||||
container_name: elasticsearch-setup
|
||||
depends_on:
|
||||
- elasticsearch
|
||||
environment:
|
||||
- ELASTICSEARCH_HOST=elasticsearch
|
||||
- ELASTICSEARCH_PORT=9200
|
||||
|
||||
datahub-gms:
|
||||
build:
|
||||
context: ../../
|
||||
dockerfile: docker/gms/Dockerfile
|
||||
hostname: datahub-gms
|
||||
container_name: datahub-gms
|
||||
ports:
|
||||
- "8080:8080"
|
||||
environment:
|
||||
- EBEAN_DATASOURCE_USERNAME=datahub
|
||||
- EBEAN_DATASOURCE_PASSWORD=datahub
|
||||
- EBEAN_DATASOURCE_HOST=mysql:3306
|
||||
- EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8
|
||||
- EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
- ELASTICSEARCH_HOST=elasticsearch
|
||||
- ELASTICSEARCH_PORT=9200
|
||||
- NEO4J_HOST=neo4j:7474
|
||||
- NEO4J_URI=bolt://neo4j
|
||||
- NEO4J_USERNAME=neo4j
|
||||
- NEO4J_PASSWORD=datahub
|
||||
depends_on:
|
||||
- elasticsearch-setup
|
||||
- kafka-setup
|
||||
- mysql
|
||||
- neo4j
|
||||
|
||||
datahub-frontend:
|
||||
build:
|
||||
context: ../../
|
||||
dockerfile: docker/frontend/Dockerfile
|
||||
hostname: datahub-frontend
|
||||
container_name: datahub-frontend
|
||||
ports:
|
||||
- "9001:9001"
|
||||
environment:
|
||||
- DATAHUB_GMS_HOST=datahub-gms
|
||||
- DATAHUB_GMS_PORT=8080
|
||||
- DATAHUB_SECRET=YouKnowNothing
|
||||
- DATAHUB_APP_VERSION=1.0
|
||||
- DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB
|
||||
depends_on:
|
||||
- datahub-gms
|
||||
|
||||
datahub-mae-consumer:
|
||||
build:
|
||||
context: ../../
|
||||
dockerfile: docker/mae-consumer/Dockerfile
|
||||
hostname: datahub-mae-consumer
|
||||
container_name: datahub-mae-consumer
|
||||
ports:
|
||||
- "9091:9091"
|
||||
environment:
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
- ELASTICSEARCH_HOST=elasticsearch
|
||||
- ELASTICSEARCH_PORT=9200
|
||||
- NEO4J_HOST=neo4j:7474
|
||||
- NEO4J_URI=bolt://neo4j
|
||||
- NEO4J_USERNAME=neo4j
|
||||
- NEO4J_PASSWORD=datahub
|
||||
depends_on:
|
||||
- kafka-setup
|
||||
- elasticsearch-setup
|
||||
- neo4j
|
||||
command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \
|
||||
echo kafka-setup done! && /start.sh'"
|
||||
|
||||
datahub-mce-consumer:
|
||||
build:
|
||||
context: ../../
|
||||
dockerfile: docker/mce-consumer/Dockerfile
|
||||
hostname: datahub-mce-consumer
|
||||
container_name: datahub-mce-consumer
|
||||
ports:
|
||||
- "9090:9090"
|
||||
environment:
|
||||
- KAFKA_BOOTSTRAP_SERVER=broker:29092
|
||||
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
- GMS_HOST=datahub-gms
|
||||
- GMS_PORT=8080
|
||||
depends_on:
|
||||
- kafka-setup
|
||||
- datahub-gms
|
||||
command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \
|
||||
echo kafka-setup done! && /start.sh'"
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: datahub_network
|
||||
|
||||
volumes:
|
||||
mysqldata:
|
||||
esdata:
|
||||
neo4jdata:
|
||||
zkdata:
|
||||
@ -1,4 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
|
||||
cd $DIR && docker-compose pull && docker-compose -p datahub up
|
||||
6
docker/schema-registry-ui/env/docker.env
vendored
Normal file
6
docker/schema-registry-ui/env/docker.env
vendored
Normal file
@ -0,0 +1,6 @@
|
||||
SCHEMAREGISTRY_URL=http://schema-registry:8081
|
||||
ALLOW_GLOBAL=true
|
||||
ALLOW_TRANSITIVE=true
|
||||
ALLOW_DELETION=true
|
||||
READONLY_MODE=true
|
||||
PROXY=true
|
||||
2
docker/schema-registry/env/docker.env
vendored
Normal file
2
docker/schema-registry/env/docker.env
vendored
Normal file
@ -0,0 +1,2 @@
|
||||
SCHEMA_REGISTRY_HOST_NAME=schemaregistry
|
||||
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=zookeeper:2181
|
||||
2
docker/zookeeper/env/docker.env
vendored
Normal file
2
docker/zookeeper/env/docker.env
vendored
Normal file
@ -0,0 +1,2 @@
|
||||
ZOOKEEPER_CLIENT_PORT=2181
|
||||
ZOOKEEPER_TICK_TIME=2000
|
||||
1
docs/docker/README.md
Normal file
1
docs/docker/README.md
Normal file
@ -0,0 +1 @@
|
||||
See [docker/README.md](../../docker/README.md).
|
||||
70
docs/docker/development.md
Normal file
70
docs/docker/development.md
Normal file
@ -0,0 +1,70 @@
|
||||
# Using Docker Images During Development
|
||||
|
||||
We've created a special `docker-compose.dev.yml` override file that should configure docker images to be easier to use
|
||||
during development.
|
||||
|
||||
Normally, you'd rebuild your images from scratch with `docker-compose build` (or `docker-compose up --build`). However,
|
||||
this takes way too long for development. It has to copy the entire repo to each image and rebuild it there.
|
||||
|
||||
The `docker-compose.dev.yml` file bypasses this problem by mounting binaries, startup scripts, and other data to
|
||||
special, slimmed down images (of which the Dockerfile is usually defined in `<service>/debug/Dockerfile`). Mounts work
|
||||
both ways, so they should also try to mount log directories on the container, so that they are easy to read on your
|
||||
local machine without needing to inspect the running container (especially if the app crashes and the container stops!).
|
||||
|
||||
We highly recommend you just invoke the `docker/dev.sh` script we've included. It is pretty small if you want to read it
|
||||
to see what it does, but it ends up using our `docker-compose.dev.yml` file.
|
||||
|
||||
## Debugging
|
||||
|
||||
The default dev images, while set up to use your local code, do not enable debugging by default. To enable debugging,
|
||||
you need to make two small edits (don't check these changes in!).
|
||||
|
||||
- Add the JVM debug flags to the environment file for the service.
|
||||
- Assign the port in the docker-compose file.
|
||||
|
||||
For example, to debug `dathaub-gms`:
|
||||
|
||||
```
|
||||
# Add this line to docker/datahub-gms/env/dev.env. You can change the port and/or change suspend=n to y.
|
||||
JAVA_TOOL_OPTIONS -agentlib:jdwp=transport=dt_socket,address=5005,server=y,suspend=n
|
||||
```
|
||||
|
||||
```
|
||||
# Change the definition in docker/docker-compose.dev.yml to this
|
||||
datahub-gms:
|
||||
image: linkedin/datahub-gms:debug
|
||||
build:
|
||||
context: datahub-gms/debug
|
||||
dockerfile: Dockerfile
|
||||
ports: # <--- Add this line
|
||||
- "5005:5005" # <--- And this line. Must match port from environment file.
|
||||
volumes:
|
||||
- ./datahub-gms/start.sh:/datahub/datahub-gms/scripts/start.sh
|
||||
- ../gms/war/build/libs/:/datahub/datahub-gms/bin
|
||||
```
|
||||
|
||||
## Tips for People New To Docker
|
||||
|
||||
### Conflicting containers
|
||||
|
||||
If you ran `docker/quickstart.sh` before, your machine may already have a container for DataHub. If you want to run
|
||||
`docker/dev.sh` instead, ensure that the old container is removed by running `docker container prune`. The opposite also
|
||||
applies.
|
||||
|
||||
> Note this only removes containers, not images. Should still be fast to switch between these once you've launched both
|
||||
> at least once.
|
||||
|
||||
### Running a specific service
|
||||
|
||||
`docker-compose up` will launch all services in the configuration, including dependencies, unless they're already
|
||||
running. If you, for some reason, wish to change this behavior, check out these example commands.
|
||||
|
||||
```
|
||||
docker-compose -p datahub -f docker-compose.yml -f docker-compose.overrides.yml -f docker-compose.dev.yml up datahub-gms
|
||||
```
|
||||
Will only start `datahub-gms` and its dependencies.
|
||||
|
||||
```
|
||||
docker-compose -p datahub -f docker-compose.yml -f docker-compose.overrides.yml -f docker-compose.dev.yml up --no-deps datahub-gms
|
||||
```
|
||||
Will only start `datahub-gms`, without dependencies.
|
||||
Loading…
x
Reference in New Issue
Block a user