From b8e18b0b5d56b4fa69b4bc35e8055176f9577dee Mon Sep 17 00:00:00 2001 From: John Plaisted Date: Thu, 6 Aug 2020 16:38:53 -0700 Subject: [PATCH] refactor(docker): make docker files easier to use during development. (#1777) * Make docker files easier to use during development. During development it quite nice to have docker work with locally built code. This allows you to launch all services very quickly, with your changes, and optionally with debugging support. Changes made to docker files: - Removed all redundant docker-compose files. We now have 1 giant file, and smaller files to use as overrides. - Remove redundant README files that provided little information. - Rename docker/ to match the service name in the docker-compose file for clarity. - Move environment variables to .env files. We only provide dev / the default environment for quickstart. - Add debug options to docker files using multistage build to build minimal images with the idea that built files will be mounted instead. - Add a docker/dev.sh script + compose file to easily use the dev override images (separate tag; images never published; uses debug docker files; mounts binaries to image). - Added docs/docker documentation for this. --- .github/workflows/docker-frontend.yml | 2 + .github/workflows/docker-gms.yml | 2 + .github/workflows/docker-mae-consumer.yml | 2 + .github/workflows/docker-mce-consumer.yml | 2 + .gitignore | 3 - docker/README.md | 55 +++- docker/broker/env/docker.env | 6 + .../{frontend => datahub-frontend}/Dockerfile | 0 docker/datahub-frontend/README.md | 16 ++ docker/datahub-frontend/env/docker.env | 5 + docker/datahub-gms/Dockerfile | 28 ++ docker/datahub-gms/README.md | 22 ++ docker/datahub-gms/env/docker.env | 13 + docker/datahub-gms/env/docker.mariadb.env | 13 + docker/datahub-gms/env/docker.postgres.env | 13 + docker/{gms => datahub-gms}/start.sh | 2 +- docker/datahub-mae-consumer/Dockerfile | 27 ++ docker/datahub-mae-consumer/README.md | 5 + docker/datahub-mae-consumer/env/docker.env | 8 + .../start.sh | 2 +- docker/datahub-mce-consumer/Dockerfile | 27 ++ docker/datahub-mce-consumer/README.md | 5 + docker/datahub-mce-consumer/env/docker.env | 4 + .../start.sh | 2 +- docker/dev.sh | 17 ++ docker/docker-compose.dev.yml | 45 +++ docker/docker-compose.override.yml | 24 ++ docker/docker-compose.yml | 192 +++++++++++++ .../Dockerfile | 0 docker/elasticsearch-setup/README.md | 5 + .../corpuser-index-config.json | 0 .../dataprocess-index-config.json | 0 .../dataset-index-config.json | 0 docker/elasticsearch-setup/env/docker.env | 2 + docker/elasticsearch/README.md | 35 --- docker/elasticsearch/docker-compose.yml | 38 --- docker/elasticsearch/env/docker.env | 3 + docker/frontend/README.md | 50 ---- docker/frontend/docker-compose.yml | 22 -- docker/gms/Dockerfile | 19 -- docker/gms/README.md | 82 ------ docker/gms/docker-compose-mariadb.yml | 30 -- docker/gms/docker-compose-postgres.yml | 30 -- docker/gms/docker-compose.yml | 30 -- docker/ingestion/README.md | 13 - docker/kafka-rest-proxy/env/docker.env | 4 + docker/{kafka => kafka-setup}/Dockerfile | 0 docker/kafka-setup/README.md | 14 + docker/kafka-setup/env/docker.env | 2 + docker/kafka-topics-ui/env/docker.env | 2 + docker/kafka/README.md | 47 --- docker/kafka/docker-compose.yml | 104 ------- docker/kibana/env/docker.env | 2 + docker/mae-consumer/Dockerfile | 19 -- docker/mae-consumer/README.md | 42 --- docker/mae-consumer/docker-compose.yml | 25 -- docker/mariadb/README.md | 33 --- ...compose.yml => docker-compose.mariadb.yml} | 16 +- docker/mce-consumer/Dockerfile | 19 -- docker/mce-consumer/README.md | 42 --- docker/mce-consumer/docker-compose.yml | 23 -- docker/mysql/README.md | 33 --- docker/mysql/docker-compose.mysql.yml | 24 ++ docker/mysql/docker-compose.yml | 21 -- docker/mysql/env/docker.env | 4 + docker/neo4j/README.md | 26 -- docker/neo4j/docker-compose.yml | 16 -- docker/neo4j/env/docker.env | 1 + docker/postgres/README.md | 6 + .../docker-compose.postgre.yml} | 12 +- docker/postgres/env/docker.env | 2 + docker/{postgresql => postgres}/init.sql | 0 docker/postgresql/README.md | 39 --- docker/quickstart.sh | 7 + docker/quickstart/README.md | 30 -- docker/quickstart/docker-compose.yml | 260 ----------------- docker/quickstart/quickstart.sh | 4 - docker/rebuild-all/docker-compose.yml | 268 ------------------ docker/rebuild-all/rebuild-all.sh | 4 - docker/schema-registry-ui/env/docker.env | 6 + docker/schema-registry/env/docker.env | 2 + docker/zookeeper/env/docker.env | 2 + docs/docker/README.md | 1 + docs/docker/development.md | 70 +++++ 84 files changed, 699 insertions(+), 1434 deletions(-) create mode 100644 docker/broker/env/docker.env rename docker/{frontend => datahub-frontend}/Dockerfile (100%) create mode 100644 docker/datahub-frontend/README.md create mode 100644 docker/datahub-frontend/env/docker.env create mode 100644 docker/datahub-gms/Dockerfile create mode 100644 docker/datahub-gms/README.md create mode 100644 docker/datahub-gms/env/docker.env create mode 100644 docker/datahub-gms/env/docker.mariadb.env create mode 100644 docker/datahub-gms/env/docker.postgres.env rename docker/{gms => datahub-gms}/start.sh (76%) mode change 100644 => 100755 create mode 100644 docker/datahub-mae-consumer/Dockerfile create mode 100644 docker/datahub-mae-consumer/README.md create mode 100644 docker/datahub-mae-consumer/env/docker.env rename docker/{mae-consumer => datahub-mae-consumer}/start.sh (71%) mode change 100644 => 100755 create mode 100644 docker/datahub-mce-consumer/Dockerfile create mode 100644 docker/datahub-mce-consumer/README.md create mode 100644 docker/datahub-mce-consumer/env/docker.env rename docker/{mce-consumer => datahub-mce-consumer}/start.sh (64%) mode change 100644 => 100755 create mode 100755 docker/dev.sh create mode 100644 docker/docker-compose.dev.yml create mode 100644 docker/docker-compose.override.yml create mode 100644 docker/docker-compose.yml rename docker/{elasticsearch => elasticsearch-setup}/Dockerfile (100%) create mode 100644 docker/elasticsearch-setup/README.md rename docker/{elasticsearch => elasticsearch-setup}/corpuser-index-config.json (100%) rename docker/{elasticsearch => elasticsearch-setup}/dataprocess-index-config.json (100%) rename docker/{elasticsearch => elasticsearch-setup}/dataset-index-config.json (100%) create mode 100644 docker/elasticsearch-setup/env/docker.env delete mode 100644 docker/elasticsearch/README.md delete mode 100644 docker/elasticsearch/docker-compose.yml create mode 100644 docker/elasticsearch/env/docker.env delete mode 100644 docker/frontend/README.md delete mode 100644 docker/frontend/docker-compose.yml delete mode 100644 docker/gms/Dockerfile delete mode 100644 docker/gms/README.md delete mode 100644 docker/gms/docker-compose-mariadb.yml delete mode 100644 docker/gms/docker-compose-postgres.yml delete mode 100644 docker/gms/docker-compose.yml create mode 100644 docker/kafka-rest-proxy/env/docker.env rename docker/{kafka => kafka-setup}/Dockerfile (100%) create mode 100644 docker/kafka-setup/README.md create mode 100644 docker/kafka-setup/env/docker.env create mode 100644 docker/kafka-topics-ui/env/docker.env delete mode 100644 docker/kafka/README.md delete mode 100644 docker/kafka/docker-compose.yml create mode 100644 docker/kibana/env/docker.env delete mode 100644 docker/mae-consumer/Dockerfile delete mode 100644 docker/mae-consumer/README.md delete mode 100644 docker/mae-consumer/docker-compose.yml rename docker/mariadb/{docker-compose.yml => docker-compose.mariadb.yml} (54%) delete mode 100644 docker/mce-consumer/Dockerfile delete mode 100644 docker/mce-consumer/README.md delete mode 100644 docker/mce-consumer/docker-compose.yml create mode 100644 docker/mysql/docker-compose.mysql.yml delete mode 100644 docker/mysql/docker-compose.yml create mode 100644 docker/mysql/env/docker.env delete mode 100644 docker/neo4j/docker-compose.yml create mode 100644 docker/neo4j/env/docker.env create mode 100644 docker/postgres/README.md rename docker/{postgresql/docker-compose.yml => postgres/docker-compose.postgre.yml} (56%) create mode 100644 docker/postgres/env/docker.env rename docker/{postgresql => postgres}/init.sql (100%) delete mode 100644 docker/postgresql/README.md create mode 100755 docker/quickstart.sh delete mode 100644 docker/quickstart/README.md delete mode 100644 docker/quickstart/docker-compose.yml delete mode 100755 docker/quickstart/quickstart.sh delete mode 100644 docker/rebuild-all/docker-compose.yml delete mode 100755 docker/rebuild-all/rebuild-all.sh create mode 100644 docker/schema-registry-ui/env/docker.env create mode 100644 docker/schema-registry/env/docker.env create mode 100644 docker/zookeeper/env/docker.env create mode 100644 docs/docker/README.md create mode 100644 docs/docker/development.md diff --git a/.github/workflows/docker-frontend.yml b/.github/workflows/docker-frontend.yml index fdc7114d75..ab9c4fdd73 100644 --- a/.github/workflows/docker-frontend.yml +++ b/.github/workflows/docker-frontend.yml @@ -21,6 +21,8 @@ jobs: echo "tag=$TAG" echo "::set-output name=tag::$TAG" - uses: docker/build-push-action@v1 + env: + DOCKER_BUILDKIT: 1 with: dockerfile: ./docker/frontend/Dockerfile username: ${{ secrets.DOCKER_USERNAME }} diff --git a/.github/workflows/docker-gms.yml b/.github/workflows/docker-gms.yml index 6f4e8de8ad..dd3e1bdcaf 100644 --- a/.github/workflows/docker-gms.yml +++ b/.github/workflows/docker-gms.yml @@ -21,6 +21,8 @@ jobs: echo "tag=$TAG" echo "::set-output name=tag::$TAG" - uses: docker/build-push-action@v1 + env: + DOCKER_BUILDKIT: 1 with: dockerfile: ./docker/gms/Dockerfile username: ${{ secrets.DOCKER_USERNAME }} diff --git a/.github/workflows/docker-mae-consumer.yml b/.github/workflows/docker-mae-consumer.yml index e5321a448c..c45b7085e7 100644 --- a/.github/workflows/docker-mae-consumer.yml +++ b/.github/workflows/docker-mae-consumer.yml @@ -21,6 +21,8 @@ jobs: echo "tag=$TAG" echo "::set-output name=tag::$TAG" - uses: docker/build-push-action@v1 + env: + DOCKER_BUILDKIT: 1 with: dockerfile: ./docker/mae-consumer/Dockerfile username: ${{ secrets.DOCKER_USERNAME }} diff --git a/.github/workflows/docker-mce-consumer.yml b/.github/workflows/docker-mce-consumer.yml index b9fc5d7832..24f342bca8 100644 --- a/.github/workflows/docker-mce-consumer.yml +++ b/.github/workflows/docker-mce-consumer.yml @@ -21,6 +21,8 @@ jobs: echo "tag=$TAG" echo "::set-output name=tag::$TAG" - uses: docker/build-push-action@v1 + env: + DOCKER_BUILDKIT: 1 with: dockerfile: ./docker/mce-consumer/Dockerfile username: ${{ secrets.DOCKER_USERNAME }} diff --git a/.gitignore b/.gitignore index 40d10a8e1c..723f38b219 100644 --- a/.gitignore +++ b/.gitignore @@ -17,11 +17,8 @@ metadata-events/mxe-registration/src/main/resources/**/*.avsc .java-version # Python -.env .venv -env/ venv/ -ENV/ env.bak/ venv.bak/ .mypy_cache/ diff --git a/docker/README.md b/docker/README.md index 4e192a14c5..544f80a26e 100644 --- a/docker/README.md +++ b/docker/README.md @@ -1,27 +1,56 @@ # Docker Images + +## Prerequisites +You need to install [docker](https://docs.docker.com/install/) and +[docker-compose](https://docs.docker.com/compose/install/) (if using Linux; on Windows and Mac compose is included with +Docker Desktop). + +Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap +area. + +## Quickstart + The easiest way to bring up and test DataHub is using DataHub [Docker](https://www.docker.com) images which are continuously deployed to [Docker Hub](https://hub.docker.com/u/linkedin) with every commit to repository. +You can easily download and run all these images and their dependencies with our +[quick start guide](../docs/quickstart.md). + +DataHub Docker Images: + * [linkedin/datahub-gms](https://cloud.docker.com/repository/docker/linkedin/datahub-gms/) * [linkedin/datahub-frontend](https://cloud.docker.com/repository/docker/linkedin/datahub-frontend/) * [linkedin/datahub-mae-consumer](https://cloud.docker.com/repository/docker/linkedin/datahub-mae-consumer/) * [linkedin/datahub-mce-consumer](https://cloud.docker.com/repository/docker/linkedin/datahub-mce-consumer/) -Above Docker images are created for DataHub specific use. You can check subdirectories to check how those images are -generated via [Dockerbuild](https://docs.docker.com/engine/reference/commandline/build/) files or -how to start each container using [Docker Compose](https://docs.docker.com/compose/). Other than these, DataHub depends -on below Docker images to be able to run: +Dependencies: * [**Kafka and Schema Registry**](kafka) -* [**Elasticsearch**](elasticsearch) +* [**Elasticsearch**](elasticsearch-setup) * [**MySQL**](mysql) -Local-built ingestion image allows you to create on an ad-hoc basis `metadatachangeevent` with Python script. -The pipeline depends on all the above images composing up. -* [**Ingestion**](ingestion) +### Ingesting demo data. -## Prerequisites -You need to install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/). +If you want to test ingesting some data once DataHub is up, see [**Ingestion**](ingestion/README.md). -## Quickstart -If you want to quickly try and evaluate DataHub by running all necessary Docker containers, you can check -[Quickstart Guide](quickstart). \ No newline at end of file +## Using Docker Images During Development + +See [Using Docker Images During Development](../docs/docker/development.md). + +## Building And Deploying Docker Images + +We use GitHub actions to build and continuously deploy our images. There should be no need to do this manually; a +successful release on Github will automatically publish the images. + +### Building images + +To build the full images (that we are going to publish), you need to run the following: + +``` +COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub build +``` + +This is because we're relying on builtkit for multistage builds. It does not hurt also set `DATAHUB_VERSION` to +something unique. + +This is not our recommended development flow and most developers should be following the +[Using Docker Images During Development](#using-docker-images-during-development) guide. \ No newline at end of file diff --git a/docker/broker/env/docker.env b/docker/broker/env/docker.env new file mode 100644 index 0000000000..4d8b2b1088 --- /dev/null +++ b/docker/broker/env/docker.env @@ -0,0 +1,6 @@ +KAFKA_BROKER_ID=1 +KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 +KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT +KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092 +KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 +KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0 diff --git a/docker/frontend/Dockerfile b/docker/datahub-frontend/Dockerfile similarity index 100% rename from docker/frontend/Dockerfile rename to docker/datahub-frontend/Dockerfile diff --git a/docker/datahub-frontend/README.md b/docker/datahub-frontend/README.md new file mode 100644 index 0000000000..d04ab880ef --- /dev/null +++ b/docker/datahub-frontend/README.md @@ -0,0 +1,16 @@ +# DataHub Frontend Docker Image + +[![datahub-frontend docker](https://github.com/linkedin/datahub/workflows/datahub-frontend%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-frontend+docker%22) + +Refer to [DataHub Frontend Service](../../datahub-frontend) to have a quick understanding of the architecture and +responsibility of this service for the DataHub. + +## Checking out DataHub UI + +After starting your Docker container, you can connect to it by typing below into your favorite web browser: + +``` +http://localhost:9001 +``` + +You can sign in with `datahub` as username and password. diff --git a/docker/datahub-frontend/env/docker.env b/docker/datahub-frontend/env/docker.env new file mode 100644 index 0000000000..c132d9a0ac --- /dev/null +++ b/docker/datahub-frontend/env/docker.env @@ -0,0 +1,5 @@ +DATAHUB_GMS_HOST=datahub-gms +DATAHUB_GMS_PORT=8080 +DATAHUB_SECRET=YouKnowNothing +DATAHUB_APP_VERSION=1.0 +DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB diff --git a/docker/datahub-gms/Dockerfile b/docker/datahub-gms/Dockerfile new file mode 100644 index 0000000000..4cfa5c08e2 --- /dev/null +++ b/docker/datahub-gms/Dockerfile @@ -0,0 +1,28 @@ +# Defining environment +ARG APP_ENV=prod + +FROM openjdk:8-jre-alpine as base +ENV DOCKERIZE_VERSION v0.6.1 +RUN apk --no-cache add curl tar \ + && curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.20.v20190813/jetty-runner-9.4.20.v20190813.jar --output jetty-runner.jar \ + && curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv + +FROM openjdk:8 as prod-build +COPY . /datahub-src +RUN cd /datahub-src && ./gradlew :gms:war:build +RUN cp /datahub-src/gms/war/build/libs/war.war /war.war + +FROM base as prod-install +COPY --from=prod-build /war.war /datahub/datahub-gms/bin/war.war +COPY --from=prod-build /datahub-src/docker/datahub-gms/start.sh /datahub/datahub-gms/scripts/start.sh +RUN chmod +x /datahub/datahub-gms/scripts/start.sh + +FROM base as dev-install +# Dummy stage for development. Assumes code is built on your machine and mounted to this image. +# See this excellent thread https://github.com/docker/cli/issues/1134 + +FROM ${APP_ENV}-install as final + +EXPOSE 8080 + +CMD /datahub/datahub-gms/scripts/start.sh \ No newline at end of file diff --git a/docker/datahub-gms/README.md b/docker/datahub-gms/README.md new file mode 100644 index 0000000000..78a3159c36 --- /dev/null +++ b/docker/datahub-gms/README.md @@ -0,0 +1,22 @@ +# DataHub Generalized Metadata Store (GMS) Docker Image +[![datahub-gms docker](https://github.com/linkedin/datahub/workflows/datahub-gms%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-gms+docker%22) + +Refer to [DataHub GMS Service](../../gms) to have a quick understanding of the architecture and +responsibility of this service for the DataHub. + +## Other Database Platforms + +While GMS defaults to using MySQL as its storage backend, it is possible to switch to any of the +[database platforms](https://ebean.io/docs/database/) supported by Ebean. + +For example, you can run the following command to start a GMS that connects to a PostgreSQL backend. + +``` +(cd docker/ && docker-compose -f docker-compose.yml -f docker-compose.postgre.yml -p datahub up) +``` + +or a MariaDB backend + +``` +(cd docker/ && docker-compose -f docker-compose.yml -f docker-compose.mariadb.yml -p datahub up) +``` diff --git a/docker/datahub-gms/env/docker.env b/docker/datahub-gms/env/docker.env new file mode 100644 index 0000000000..34bb9df0e8 --- /dev/null +++ b/docker/datahub-gms/env/docker.env @@ -0,0 +1,13 @@ +EBEAN_DATASOURCE_USERNAME=datahub +EBEAN_DATASOURCE_PASSWORD=datahub +EBEAN_DATASOURCE_HOST=mysql:3306 +EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8 +EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver +KAFKA_BOOTSTRAP_SERVER=broker:29092 +KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 +ELASTICSEARCH_HOST=elasticsearch +ELASTICSEARCH_PORT=9200 +NEO4J_HOST=neo4j:7474 +NEO4J_URI=bolt://neo4j +NEO4J_USERNAME=neo4j +NEO4J_PASSWORD=datahub diff --git a/docker/datahub-gms/env/docker.mariadb.env b/docker/datahub-gms/env/docker.mariadb.env new file mode 100644 index 0000000000..5abd67a74f --- /dev/null +++ b/docker/datahub-gms/env/docker.mariadb.env @@ -0,0 +1,13 @@ +EBEAN_DATASOURCE_USERNAME=datahub +EBEAN_DATASOURCE_PASSWORD=datahub +EBEAN_DATASOURCE_HOST=mariadb:3306 +EBEAN_DATASOURCE_URL=jdbc:mariadb://mariadb:3306/datahub +EBEAN_DATASOURCE_DRIVER=org.mariadb.jdbc.Driver +KAFKA_BOOTSTRAP_SERVER=broker:29092 +KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 +ELASTICSEARCH_HOST=elasticsearch +ELASTICSEARCH_PORT=9200 +NEO4J_HOST=neo4j:7474 +NEO4J_URI=bolt://neo4j +NEO4J_USERNAME=neo4j +NEO4J_PASSWORD=datahub diff --git a/docker/datahub-gms/env/docker.postgres.env b/docker/datahub-gms/env/docker.postgres.env new file mode 100644 index 0000000000..bff43aaa01 --- /dev/null +++ b/docker/datahub-gms/env/docker.postgres.env @@ -0,0 +1,13 @@ +EBEAN_DATASOURCE_USERNAME=datahub +EBEAN_DATASOURCE_PASSWORD=datahub +EBEAN_DATASOURCE_HOST=postgres:5432 +EBEAN_DATASOURCE_URL=jdbc:postgresql://postgres:5432/datahub +EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver +KAFKA_BOOTSTRAP_SERVER=broker:29092 +KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 +ELASTICSEARCH_HOST=elasticsearch +ELASTICSEARCH_PORT=9200 +NEO4J_HOST=neo4j:7474 +NEO4J_URI=bolt://neo4j +NEO4J_USERNAME=neo4j +NEO4J_PASSWORD=datahub diff --git a/docker/gms/start.sh b/docker/datahub-gms/start.sh old mode 100644 new mode 100755 similarity index 76% rename from docker/gms/start.sh rename to docker/datahub-gms/start.sh index c6edab4551..37e81941f8 --- a/docker/gms/start.sh +++ b/docker/datahub-gms/start.sh @@ -6,4 +6,4 @@ dockerize \ -wait http://$ELASTICSEARCH_HOST:$ELASTICSEARCH_PORT \ -wait http://$NEO4J_HOST \ -timeout 240s \ - java -jar jetty-runner.jar gms.war \ No newline at end of file + java -jar /jetty-runner.jar /datahub/datahub-gms/bin/war.war \ No newline at end of file diff --git a/docker/datahub-mae-consumer/Dockerfile b/docker/datahub-mae-consumer/Dockerfile new file mode 100644 index 0000000000..33558e7d57 --- /dev/null +++ b/docker/datahub-mae-consumer/Dockerfile @@ -0,0 +1,27 @@ +# Defining environment +ARG APP_ENV=prod + +FROM openjdk:8-jre-alpine as base +ENV DOCKERIZE_VERSION v0.6.1 +RUN apk --no-cache add curl tar \ + && curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv + +FROM openjdk:8 as prod-build +COPY . datahub-src +RUN cd datahub-src && ./gradlew :metadata-jobs:mae-consumer-job:build +RUN cd datahub-src && cp metadata-jobs/mae-consumer-job/build/libs/mae-consumer-job.jar ../mae-consumer-job.jar + +FROM base as prod-install +COPY --from=prod-build /mae-consumer-job.jar /datahub/datahub-mae-consumer/bin/ +COPY --from=prod-build /datahub-src/docker/datahub-mae-consumer/start.sh /datahub/datahub-mae-consumer/scripts/ +RUN chmod +x /datahub/datahub-mae-consumer/scripts/start.sh + +FROM base as dev-install +# Dummy stage for development. Assumes code is built on your machine and mounted to this image. +# See this excellent thread https://github.com/docker/cli/issues/1134 + +FROM ${APP_ENV}-install as final + +EXPOSE 9090 + +CMD /datahub/datahub-mae-consumer/scripts/start.sh \ No newline at end of file diff --git a/docker/datahub-mae-consumer/README.md b/docker/datahub-mae-consumer/README.md new file mode 100644 index 0000000000..e95e875440 --- /dev/null +++ b/docker/datahub-mae-consumer/README.md @@ -0,0 +1,5 @@ +# DataHub MetadataAuditEvent (MAE) Consumer Docker Image +[![datahub-mae-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mae-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mae-consumer+docker%22) + +Refer to [DataHub MAE Consumer Job](../../metadata-jobs/mae-consumer-job) to have a quick understanding of the architecture and +responsibility of this service for the DataHub. diff --git a/docker/datahub-mae-consumer/env/docker.env b/docker/datahub-mae-consumer/env/docker.env new file mode 100644 index 0000000000..5788020a95 --- /dev/null +++ b/docker/datahub-mae-consumer/env/docker.env @@ -0,0 +1,8 @@ +KAFKA_BOOTSTRAP_SERVER=broker:29092 +KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 +ELASTICSEARCH_HOST=elasticsearch +ELASTICSEARCH_PORT=9200 +NEO4J_HOST=neo4j:7474 +NEO4J_URI=bolt://neo4j +NEO4J_USERNAME=neo4j +NEO4J_PASSWORD=datahub diff --git a/docker/mae-consumer/start.sh b/docker/datahub-mae-consumer/start.sh old mode 100644 new mode 100755 similarity index 71% rename from docker/mae-consumer/start.sh rename to docker/datahub-mae-consumer/start.sh index e06a8b759d..4f7f2837a3 --- a/docker/mae-consumer/start.sh +++ b/docker/datahub-mae-consumer/start.sh @@ -5,4 +5,4 @@ dockerize \ -wait http://$ELASTICSEARCH_HOST:$ELASTICSEARCH_PORT \ -wait http://$NEO4J_HOST \ -timeout 240s \ - java -jar mae-consumer-job.jar \ No newline at end of file + java -jar /datahub/datahub-mae-consumer/bin/mae-consumer-job.jar \ No newline at end of file diff --git a/docker/datahub-mce-consumer/Dockerfile b/docker/datahub-mce-consumer/Dockerfile new file mode 100644 index 0000000000..266d9a684c --- /dev/null +++ b/docker/datahub-mce-consumer/Dockerfile @@ -0,0 +1,27 @@ +# Defining environment +ARG APP_ENV=prod + +FROM openjdk:8-jre-alpine as base +ENV DOCKERIZE_VERSION v0.6.1 +RUN apk --no-cache add curl tar \ + && curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv + +FROM openjdk:8 as prod-build +COPY . datahub-src +RUN cd datahub-src && ./gradlew :metadata-jobs:mce-consumer-job:build +RUN cd datahub-src && cp metadata-jobs/mce-consumer-job/build/libs/mce-consumer-job.jar ../mce-consumer-job.jar + +FROM base as prod-install +COPY --from=prod-build /mce-consumer-job.jar /datahub/datahub-mce-consumer/bin/ +COPY --from=prod-build /datahub-src/docker/datahub-mce-consumer/start.sh /datahub/datahub-mce-consumer/scripts/ +RUN chmod +x /datahub/datahub-mce-consumer/scripts/start.sh + +FROM base as dev-install +# Dummy stage for development. Assumes code is built on your machine and mounted to this image. +# See this excellent thread https://github.com/docker/cli/issues/1134 + +FROM ${APP_ENV}-install as final + +EXPOSE 9090 + +CMD /datahub/datahub-mce-consumer/scripts/start.sh \ No newline at end of file diff --git a/docker/datahub-mce-consumer/README.md b/docker/datahub-mce-consumer/README.md new file mode 100644 index 0000000000..e4a05fe851 --- /dev/null +++ b/docker/datahub-mce-consumer/README.md @@ -0,0 +1,5 @@ +# DataHub MetadataChangeEvent (MCE) Consumer Docker Image +[![datahub-mce-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mce-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mce-consumer+docker%22) + +Refer to [DataHub MCE Consumer Job](../../metadata-jobs/mce-consumer-job) to have a quick understanding of the architecture and +responsibility of this service for the DataHub. diff --git a/docker/datahub-mce-consumer/env/docker.env b/docker/datahub-mce-consumer/env/docker.env new file mode 100644 index 0000000000..59907e8278 --- /dev/null +++ b/docker/datahub-mce-consumer/env/docker.env @@ -0,0 +1,4 @@ +KAFKA_BOOTSTRAP_SERVER=broker:29092 +KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 +GMS_HOST=datahub-gms +GMS_PORT=8080 diff --git a/docker/mce-consumer/start.sh b/docker/datahub-mce-consumer/start.sh old mode 100644 new mode 100755 similarity index 64% rename from docker/mce-consumer/start.sh rename to docker/datahub-mce-consumer/start.sh index 7835873711..302df31d10 --- a/docker/mce-consumer/start.sh +++ b/docker/datahub-mce-consumer/start.sh @@ -4,4 +4,4 @@ dockerize \ -wait tcp://$KAFKA_BOOTSTRAP_SERVER \ -timeout 240s \ - java -jar mce-consumer-job.jar \ No newline at end of file + java -jar /datahub/datahub-mce-consumer/bin/mce-consumer-job.jar \ No newline at end of file diff --git a/docker/dev.sh b/docker/dev.sh new file mode 100755 index 0000000000..1199621f3e --- /dev/null +++ b/docker/dev.sh @@ -0,0 +1,17 @@ +#!/bin/bash + +# Launches dev instances of DataHub images. See documentation for more details. +# YOU MUST BUILD VIA GRADLE BEFORE RUNNING THIS. +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" +cd $DIR && \ + COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose \ + -f docker-compose.yml \ + -f docker-compose.override.yml \ + -f docker-compose.dev.yml \ + pull \ +&& \ + COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub \ + -f docker-compose.yml \ + -f docker-compose.override.yml \ + -f docker-compose.dev.yml \ + up \ No newline at end of file diff --git a/docker/docker-compose.dev.yml b/docker/docker-compose.dev.yml new file mode 100644 index 0000000000..bf6e9c13f2 --- /dev/null +++ b/docker/docker-compose.dev.yml @@ -0,0 +1,45 @@ +# Default overrides for running local development. + +# Images here are made as "development" images by following the general pattern of defining a multistage build with +# separate prod/dev steps; using APP_ENV to specify which to use. The dev steps should avoid building and instead assume +# that binaries and scripts will be mounted to the image, as also set up by this file. Also see see this excellent +# thread https://github.com/docker/cli/issues/1134. + +# To make a JVM app debuggable via IntelliJ, go to its env file and add JVM debug flags, and then add the JVM debug +# port to this file. +--- +# TODO mount + debug docker file for frontend +version: '3.8' +services: + datahub-gms: + image: linkedin/datahub-gms:debug + build: + context: datahub-gms + dockerfile: Dockerfile + args: + APP_ENV: dev + volumes: + - ./datahub-gms/start.sh:/datahub/datahub-gms/scripts/start.sh + - ../gms/war/build/libs/:/datahub/datahub-gms/bin + + datahub-mae-consumer: + image: linkedin/datahub-mae-consumer:debug + build: + context: datahub-mae-consumer + dockerfile: Dockerfile + args: + APP_ENV: dev + volumes: + - ./datahub-mae-consumer/start.sh:/datahub/datahub-mae-consumer/scripts/start.sh + - ../metadata-jobs/mae-consumer-job/build/libs/:/datahub/datahub-mae-consumer/bin/ + + datahub-mce-consumer: + image: linkedin/datahub-mce-consumer:debug + build: + context: datahub-mce-consumer + dockerfile: Dockerfile + args: + APP_ENV: dev + volumes: + - ./datahub-mce-consumer/start.sh:/datahub/datahub-mce-consumer/scripts/start.sh + - ../metadata-jobs/mce-consumer-job/build/libs/:/datahub/datahub-mce-consumer/bin diff --git a/docker/docker-compose.override.yml b/docker/docker-compose.override.yml new file mode 100644 index 0000000000..28cab46816 --- /dev/null +++ b/docker/docker-compose.override.yml @@ -0,0 +1,24 @@ +# Default override to use MySQL as a backing store for datahub-gms (same as docker-compose.mysql.yml). +--- +version: '3.8' +services: + mysql: + container_name: mysql + hostname: mysql + image: mysql:5.7 + env_file: mysql/env/docker.env + restart: always + command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci + ports: + - "3306:3306" + volumes: + - ./mysql/init.sql:/docker-entrypoint-initdb.d/init.sql + - mysqldata:/var/lib/mysql + + datahub-gms: + env_file: datahub-gms/env/docker.env + depends_on: + - mysql + +volumes: + mysqldata: diff --git a/docker/docker-compose.yml b/docker/docker-compose.yml new file mode 100644 index 0000000000..1a152a7efb --- /dev/null +++ b/docker/docker-compose.yml @@ -0,0 +1,192 @@ +# Docker compose file covering DataHub's default configuration, which is to run all containers on a single host. + +# Please see the README.md for instructions as to how to use and customize. + +# NOTE: This file will cannot build! No dockerfiles are set. See the README.md in this directory. +--- +version: '3.8' +services: + zookeeper: + image: confluentinc/cp-zookeeper:5.4.0 + env_file: zookeeper/env/docker.env + hostname: zookeeper + container_name: zookeeper + ports: + - "2181:2181" + volumes: + - zkdata:/var/opt/zookeeper + + broker: + image: confluentinc/cp-kafka:5.4.0 + env_file: broker/env/docker.env + hostname: broker + container_name: broker + depends_on: + - zookeeper + ports: + - "29092:29092" + - "9092:9092" + + kafka-rest-proxy: + image: confluentinc/cp-kafka-rest:5.4.0 + env_file: kafka-rest-proxy/env/docker.env + hostname: kafka-rest-proxy + container_name: kafka-rest-proxy + ports: + - "8082:8082" + depends_on: + - zookeeper + - broker + - schema-registry + + kafka-topics-ui: + image: landoop/kafka-topics-ui:0.9.4 + env_file: kafka-topics-ui/env/docker.env + hostname: kafka-topics-ui + container_name: kafka-topics-ui + ports: + - "18000:8000" + depends_on: + - zookeeper + - broker + - schema-registry + - kafka-rest-proxy + + # This "container" is a workaround to pre-create topics + kafka-setup: + build: + context: kafka-setup + env_file: kafka-setup/env/docker.env + hostname: kafka-setup + container_name: kafka-setup + depends_on: + - broker + - schema-registry + + schema-registry: + image: confluentinc/cp-schema-registry:5.4.0 + env_file: schema-registry/env/docker.env + hostname: schema-registry + container_name: schema-registry + depends_on: + - zookeeper + - broker + ports: + - "8081:8081" + + schema-registry-ui: + image: landoop/schema-registry-ui:latest + env_file: schema-registry-ui/env/docker.env + container_name: schema-registry-ui + hostname: schema-registry-ui + ports: + - "8000:8000" + depends_on: + - schema-registry + + elasticsearch: + image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8 + env_file: elasticsearch/env/docker.env + container_name: elasticsearch + hostname: elasticsearch + ports: + - "9200:9200" + volumes: + - esdata:/usr/share/elasticsearch/data + + kibana: + image: docker.elastic.co/kibana/kibana:5.6.8 + env_file: kibana/env/docker.env + container_name: kibana + hostname: kibana + ports: + - "5601:5601" + depends_on: + - elasticsearch + + neo4j: + image: neo4j:3.5.7 + env_file: neo4j/env/docker.env + hostname: neo4j + container_name: neo4j + ports: + - "7474:7474" + - "7687:7687" + volumes: + - neo4jdata:/data + + # This "container" is a workaround to pre-create search indices + elasticsearch-setup: + build: + context: elasticsearch-setup + env_file: elasticsearch-setup/env/docker.env + hostname: elasticsearch-setup + container_name: elasticsearch-setup + depends_on: + - elasticsearch + + datahub-gms: + build: + context: ../ + dockerfile: docker/datahub-gms/Dockerfile + image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest} + hostname: datahub-gms + container_name: datahub-gms + ports: + - "8080:8080" + depends_on: + - elasticsearch-setup + - kafka-setup + - mysql + - neo4j + + datahub-frontend: + build: + context: ../ + dockerfile: docker/datahub-frontend/Dockerfile + image: linkedin/datahub-frontend:${DATAHUB_VERSION:-latest} + env_file: datahub-frontend/env/docker.env + hostname: datahub-frontend + container_name: datahub-frontend + ports: + - "9001:9001" + depends_on: + - datahub-gms + + datahub-mae-consumer: + build: + context: ../ + dockerfile: docker/datahub-mae-consumer/Dockerfile + image: linkedin/datahub-mae-consumer:${DATAHUB_VERSION:-latest} + env_file: datahub-mae-consumer/env/docker.env + hostname: datahub-mae-consumer + container_name: datahub-mae-consumer + ports: + - "9091:9091" + depends_on: + - kafka-setup + - elasticsearch-setup + - neo4j + + datahub-mce-consumer: + build: + context: ../ + dockerfile: docker/datahub-mce-consumer/Dockerfile + image: linkedin/datahub-mce-consumer:${DATAHUB_VERSION:-latest} + env_file: datahub-mce-consumer/env/docker.env + hostname: datahub-mce-consumer + container_name: datahub-mce-consumer + ports: + - "9090:9090" + depends_on: + - kafka-setup + - datahub-gms + +networks: + default: + name: datahub_network + +volumes: + esdata: + neo4jdata: + zkdata: diff --git a/docker/elasticsearch/Dockerfile b/docker/elasticsearch-setup/Dockerfile similarity index 100% rename from docker/elasticsearch/Dockerfile rename to docker/elasticsearch-setup/Dockerfile diff --git a/docker/elasticsearch-setup/README.md b/docker/elasticsearch-setup/README.md new file mode 100644 index 0000000000..47c4fc5c45 --- /dev/null +++ b/docker/elasticsearch-setup/README.md @@ -0,0 +1,5 @@ +# Elasticsearch & Kibana + +DataHub uses Elasticsearch as a search engine. Elasticsearch powers search, typeahead and browse functions for DataHub. +[Official Elasticsearch Docker image](https://hub.docker.com/_/elasticsearch) found in Docker Hub is used without +any modification. \ No newline at end of file diff --git a/docker/elasticsearch/corpuser-index-config.json b/docker/elasticsearch-setup/corpuser-index-config.json similarity index 100% rename from docker/elasticsearch/corpuser-index-config.json rename to docker/elasticsearch-setup/corpuser-index-config.json diff --git a/docker/elasticsearch/dataprocess-index-config.json b/docker/elasticsearch-setup/dataprocess-index-config.json similarity index 100% rename from docker/elasticsearch/dataprocess-index-config.json rename to docker/elasticsearch-setup/dataprocess-index-config.json diff --git a/docker/elasticsearch/dataset-index-config.json b/docker/elasticsearch-setup/dataset-index-config.json similarity index 100% rename from docker/elasticsearch/dataset-index-config.json rename to docker/elasticsearch-setup/dataset-index-config.json diff --git a/docker/elasticsearch-setup/env/docker.env b/docker/elasticsearch-setup/env/docker.env new file mode 100644 index 0000000000..04fe050539 --- /dev/null +++ b/docker/elasticsearch-setup/env/docker.env @@ -0,0 +1,2 @@ +ELASTICSEARCH_HOST=elasticsearch +ELASTICSEARCH_PORT=9200 diff --git a/docker/elasticsearch/README.md b/docker/elasticsearch/README.md deleted file mode 100644 index 963e88bbd3..0000000000 --- a/docker/elasticsearch/README.md +++ /dev/null @@ -1,35 +0,0 @@ -# Elasticsearch & Kibana - -DataHub uses Elasticsearch as a search engine. Elasticsearch powers search, typeahead and browse functions for DataHub. -[Official Elasticsearch Docker image](https://hub.docker.com/_/elasticsearch) found in Docker Hub is used without -any modification. - -## Run Docker container -Below command will start the Elasticsearch and Kibana containers. `DataHub` uses Elasticsearch release `5.6.8`. Newer -versions of Elasticsearch are not tested and you might experience compatibility issues. -``` -cd docker/elasticsearch && docker-compose pull && docker-compose up --build -``` -You can connect to Kibana on your web browser to monitor Elasticsearch via below link: -``` -http://localhost:5601 -``` - -## Container configuration -### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - "9200:9200" -``` - -### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` \ No newline at end of file diff --git a/docker/elasticsearch/docker-compose.yml b/docker/elasticsearch/docker-compose.yml deleted file mode 100644 index 317bd44766..0000000000 --- a/docker/elasticsearch/docker-compose.yml +++ /dev/null @@ -1,38 +0,0 @@ ---- -version: '3.5' -services: - elasticsearch: - image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8 - container_name: elasticsearch - hostname: elasticsearch - ports: - - "9200:9200" - environment: - - discovery.type=single-node - - xpack.security.enabled=false - - "ES_JAVA_OPTS=-Xms1g -Xmx1g" - - kibana: - image: docker.elastic.co/kibana/kibana:5.6.8 - container_name: kibana - hostname: kibana - ports: - - "5601:5601" - depends_on: - - elasticsearch - - # This "container" is a workaround to pre-create search indices - elasticsearch-setup: - build: - context: . - hostname: elasticsearch-setup - container_name: elasticsearch-setup - depends_on: - - elasticsearch - environment: - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/elasticsearch/env/docker.env b/docker/elasticsearch/env/docker.env new file mode 100644 index 0000000000..75b52bd983 --- /dev/null +++ b/docker/elasticsearch/env/docker.env @@ -0,0 +1,3 @@ +discovery.type=single-node +xpack.security.enabled=false +ES_JAVA_OPTS=-Xms1g -Xmx1g diff --git a/docker/frontend/README.md b/docker/frontend/README.md deleted file mode 100644 index 0cfcc99275..0000000000 --- a/docker/frontend/README.md +++ /dev/null @@ -1,50 +0,0 @@ -# DataHub Frontend Docker Image -[![datahub-frontend docker](https://github.com/linkedin/datahub/workflows/datahub-frontend%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-frontend+docker%22) - -Refer to [DataHub Frontend Service](../../datahub-frontend) to have a quick understanding of the architecture and -responsibility of this service for the DataHub. - -## Build & Run -``` -cd docker/frontend && docker-compose up --build -``` -This command will rebuild the docker image and start a container based on the image. - -To start a container using an existing image, run the same command without the `--build` flag. - -### Container configuration -#### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - "9001:9001" -``` - -#### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` - -#### datahub-gms Container -Before starting `datahub-frontend` container, `datahub-gms` container should already be up and running. -`datahub-frontend` service creates a connection to `datahub-gms` service and this is configured with environment -variables in `docker-compose.yml`: -``` -environment: - - DATAHUB_GMS_HOST=datahub-gms - - DATAHUB_GMS_PORT=8080 -``` -The value of `DATAHUB_GMS_HOST` variable should be set to the host name of the `datahub-gms` container within the Docker network. - -## Checking out DataHub UI -After starting your Docker container, you can connect to it by typing below into your favorite web browser: -``` -http://localhost:9001 -``` -You can sign in with `datahub` as username and password. diff --git a/docker/frontend/docker-compose.yml b/docker/frontend/docker-compose.yml deleted file mode 100644 index 6ccae92b8a..0000000000 --- a/docker/frontend/docker-compose.yml +++ /dev/null @@ -1,22 +0,0 @@ ---- -version: '3.5' -services: - datahub-frontend: - image: linkedin/datahub-frontend:${DATAHUB_VERSION:-latest} - build: - context: ../../ - dockerfile: docker/frontend/Dockerfile - hostname: datahub-frontend - container_name: datahub-frontend - ports: - - "9001:9001" - environment: - - DATAHUB_GMS_HOST=datahub-gms - - DATAHUB_GMS_PORT=8080 - - DATAHUB_SECRET=YouKnowNothing - - DATAHUB_APP_VERSION=1.0 - - DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/gms/Dockerfile b/docker/gms/Dockerfile deleted file mode 100644 index e795445d77..0000000000 --- a/docker/gms/Dockerfile +++ /dev/null @@ -1,19 +0,0 @@ -FROM openjdk:8 as builder -COPY . /datahub-src -RUN cd /datahub-src && ./gradlew :gms:war:build \ - && cp gms/war/build/libs/war.war /gms.war - - -FROM openjdk:8-jre-alpine -ENV DOCKERIZE_VERSION v0.6.1 -RUN apk --no-cache add curl tar \ - && curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.20.v20190813/jetty-runner-9.4.20.v20190813.jar --output jetty-runner.jar \ - && curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv - -COPY --from=builder /gms.war . -COPY docker/gms/start.sh /start.sh -RUN chmod +x /start.sh - -EXPOSE 8080 - -CMD /start.sh \ No newline at end of file diff --git a/docker/gms/README.md b/docker/gms/README.md deleted file mode 100644 index 2620ab6a2e..0000000000 --- a/docker/gms/README.md +++ /dev/null @@ -1,82 +0,0 @@ -# DataHub Generalized Metadata Store (GMS) Docker Image -[![datahub-gms docker](https://github.com/linkedin/datahub/workflows/datahub-gms%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-gms+docker%22) - -Refer to [DataHub GMS Service](../../gms) to have a quick understanding of the architecture and -responsibility of this service for the DataHub. - - -## Build & Run -``` -cd docker/gms && docker-compose up --build -``` -This command will rebuild the local docker image and start a container based on the image. - -To start a container using an existing image, run the same command without the `--build` flag. - -### Container configuration -#### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - "8080:8080" -``` - -#### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` - -#### MySQL, Elasticsearch and Kafka Containers -Before starting `datahub-gms` container, `mysql`, `elasticsearch`, `neo4j` and `kafka` containers should already be up and running. -These connections are configured via environment variables in `docker-compose.yml`: -``` -environment: - - EBEAN_DATASOURCE_USERNAME=datahub - - EBEAN_DATASOURCE_PASSWORD=datahub - - EBEAN_DATASOURCE_HOST=mysql:3306 - - EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub - - EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver -``` -The value of `EBEAN_DATASOURCE_HOST` variable should be set to the host name of the `mysql` container within the Docker network. - -``` -environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 -``` -The value of `KAFKA_BOOTSTRAP_SERVER` variable should be set to the host name of the `kafka broker` container within the Docker network. -The value of `KAFKA_SCHEMAREGISTRY_URL` variable should be set to the host name of the `kafka schema registry` container within the Docker network. - -``` -environment: - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 -``` -The value of `ELASTICSEARCH_HOST` variable should be set to the host name of the `elasticsearch` container within the Docker network. - -``` -environment: - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub -``` -The value of `NEO4J_URI` variable should be set to the host name of the `neo4j` container within the Docker network. - -## Other Database Platforms -While GMS defaults to using MySQL as its storage backend, it is possible to switch to any of the -[database platforms](https://ebean.io/docs/database/) supported by Ebean. -For example, you can run the following command to start a GMS that connects to a PostgreSQL backend -``` -cd docker/gms && docker-compose -f docker-compose-postgres.yml up --build -``` -or a MariaDB backend -``` -cd docker/gms && docker-compose -f docker-compose-mariadb.yml up --build -``` diff --git a/docker/gms/docker-compose-mariadb.yml b/docker/gms/docker-compose-mariadb.yml deleted file mode 100644 index 435d17ea95..0000000000 --- a/docker/gms/docker-compose-mariadb.yml +++ /dev/null @@ -1,30 +0,0 @@ ---- -version: '3.5' -services: - datahub-gms: - image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest} - build: - context: ../../ - dockerfile: docker/gms/Dockerfile - hostname: datahub-gms - container_name: datahub-gms - ports: - - "8080:8080" - environment: - - EBEAN_DATASOURCE_USERNAME=datahub - - EBEAN_DATASOURCE_PASSWORD=datahub - - EBEAN_DATASOURCE_HOST=mariadb:3306 - - EBEAN_DATASOURCE_URL=jdbc:mariadb://mariadb:3306/datahub - - EBEAN_DATASOURCE_DRIVER=org.mariadb.jdbc.Driver - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/gms/docker-compose-postgres.yml b/docker/gms/docker-compose-postgres.yml deleted file mode 100644 index dcd1608acc..0000000000 --- a/docker/gms/docker-compose-postgres.yml +++ /dev/null @@ -1,30 +0,0 @@ ---- -version: '3.5' -services: - datahub-gms: - image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest} - build: - context: ../../ - dockerfile: docker/gms/Dockerfile - hostname: datahub-gms - container_name: datahub-gms - ports: - - "8080:8080" - environment: - - EBEAN_DATASOURCE_USERNAME=datahub - - EBEAN_DATASOURCE_PASSWORD=datahub - - EBEAN_DATASOURCE_HOST=postgres:5432 - - EBEAN_DATASOURCE_URL=jdbc:postgresql://postgres:5432/datahub - - EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/gms/docker-compose.yml b/docker/gms/docker-compose.yml deleted file mode 100644 index 2600dba1e3..0000000000 --- a/docker/gms/docker-compose.yml +++ /dev/null @@ -1,30 +0,0 @@ ---- -version: '3.5' -services: - datahub-gms: - image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest} - build: - context: ../../ - dockerfile: docker/gms/Dockerfile - hostname: datahub-gms - container_name: datahub-gms - ports: - - "8080:8080" - environment: - - EBEAN_DATASOURCE_USERNAME=datahub - - EBEAN_DATASOURCE_PASSWORD=datahub - - EBEAN_DATASOURCE_HOST=mysql:3306 - - EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true - - EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/ingestion/README.md b/docker/ingestion/README.md index 5eaf1e9e85..7db93d41bb 100644 --- a/docker/ingestion/README.md +++ b/docker/ingestion/README.md @@ -2,16 +2,3 @@ Refer to [DataHub Metadata Ingestion](../../metadata-ingestion/mce-cli) to have a quick understanding of the architecture and responsibility of this service for the DataHub. - -## Build & Run -``` -cd docker/ingestion && docker-compose up --build -``` -This command will rebuild the docker image and start a container based on the image. - -To start a container using an existing image, run the same command without the `--build` flag. - -### Container configuration - -#### Prerequisite Containers -Before starting `ingestion` container, `kafka`, `datahub-gms`, `mysql` and `datahub-mce-consumer` containers should already be up and running. \ No newline at end of file diff --git a/docker/kafka-rest-proxy/env/docker.env b/docker/kafka-rest-proxy/env/docker.env new file mode 100644 index 0000000000..1e96b3dde1 --- /dev/null +++ b/docker/kafka-rest-proxy/env/docker.env @@ -0,0 +1,4 @@ +KAFKA_REST_LISTENERS=http://0.0.0.0:8082/ +KAFKA_REST_SCHEMA_REGISTRY_URL=http://schema-registry:8081/ +KAFKA_REST_HOST_NAME=kafka-rest-proxy +KAFKA_REST_BOOTSTRAP_SERVERS=PLAINTEXT://broker:29092 diff --git a/docker/kafka/Dockerfile b/docker/kafka-setup/Dockerfile similarity index 100% rename from docker/kafka/Dockerfile rename to docker/kafka-setup/Dockerfile diff --git a/docker/kafka-setup/README.md b/docker/kafka-setup/README.md new file mode 100644 index 0000000000..485218abb5 --- /dev/null +++ b/docker/kafka-setup/README.md @@ -0,0 +1,14 @@ +# Kafka, Zookeeper and Schema Registry + +DataHub uses Kafka as the pub-sub message queue in the backend. +[Official Confluent Kafka Docker images](https://hub.docker.com/u/confluentinc) found in Docker Hub is used without +any modification. + +## Debugging Kafka +You can install [kafkacat](https://github.com/edenhill/kafkacat) to consume and produce messaged to Kafka topics. +For example, to consume messages on MetadataAuditEvent topic, you can run below command. +``` +kafkacat -b localhost:9092 -t MetadataAuditEvent +``` +However, `kafkacat` currently doesn't support Avro deserialization at this point, +but they have an ongoing [work](https://github.com/edenhill/kafkacat/pull/151) for that. \ No newline at end of file diff --git a/docker/kafka-setup/env/docker.env b/docker/kafka-setup/env/docker.env new file mode 100644 index 0000000000..91f64e1cac --- /dev/null +++ b/docker/kafka-setup/env/docker.env @@ -0,0 +1,2 @@ +KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 +KAFKA_BOOTSTRAP_SERVER=broker:29092 diff --git a/docker/kafka-topics-ui/env/docker.env b/docker/kafka-topics-ui/env/docker.env new file mode 100644 index 0000000000..bc4b8ea797 --- /dev/null +++ b/docker/kafka-topics-ui/env/docker.env @@ -0,0 +1,2 @@ +KAFKA_REST_PROXY_URL="http://kafkarestproxy:8082/" +PROXY="true" diff --git a/docker/kafka/README.md b/docker/kafka/README.md deleted file mode 100644 index dc1556c867..0000000000 --- a/docker/kafka/README.md +++ /dev/null @@ -1,47 +0,0 @@ -# Kafka, Zookeeper and Schema Registry - -DataHub uses Kafka as the pub-sub message queue in the backend. -[Official Confluent Kafka Docker images](https://hub.docker.com/u/confluentinc) found in Docker Hub is used without -any modification. - -## Run Docker container -Below command will start all Kafka related containers. -``` -cd docker/kafka && docker-compose pull && docker-compose up -``` -As part of `docker-compose`, we also initialize a container called `kafka-setup` to create `MetadataAuditEvent` and -`MetadataChangeEvent` & `FailedMetadataChangeEvent` topics. The only thing this container does is creating Kafka topics after Kafka broker is ready. - -There is also a container which provides visual schema registry interface which you can register/unregister schemas. -You can connect to `schema-registry-ui` on your web browser to monitor Kafka Schema Registry via below link: -``` -http://localhost:8000 -``` - -## Container configuration -### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - "9092:9092" -``` - -### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` - -## Debugging Kafka -You can install [kafkacat](https://github.com/edenhill/kafkacat) to consume and produce messaged to Kafka topics. -For example, to consume messages on MetadataAuditEvent topic, you can run below command. -``` -kafkacat -b localhost:9092 -t MetadataAuditEvent -``` -However, `kafkacat` currently doesn't support Avro deserialization at this point, -but they have an ongoing [work](https://github.com/edenhill/kafkacat/pull/151) for that. \ No newline at end of file diff --git a/docker/kafka/docker-compose.yml b/docker/kafka/docker-compose.yml deleted file mode 100644 index b8db356ef8..0000000000 --- a/docker/kafka/docker-compose.yml +++ /dev/null @@ -1,104 +0,0 @@ ---- -version: '3.5' -services: - zookeeper: - image: confluentinc/cp-zookeeper:5.4.0 - hostname: zookeeper - container_name: zookeeper - ports: - - "2181:2181" - environment: - ZOOKEEPER_CLIENT_PORT: 2181 - ZOOKEEPER_TICK_TIME: 2000 - - broker: - image: confluentinc/cp-kafka:5.4.0 - hostname: broker - container_name: broker - depends_on: - - zookeeper - ports: - - "29092:29092" - - "9092:9092" - environment: - KAFKA_BROKER_ID: 1 - KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181' - KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT - KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092 - KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 - KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 - - # This "container" is a workaround to pre-create topics - kafka-setup: - build: - context: . - hostname: kafka-setup - container_name: kafka-setup - depends_on: - - broker - - schema-registry - environment: - - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - kafka-rest-proxy: - image: confluentinc/cp-kafka-rest:5.4.0 - hostname: kafka-rest-proxy - ports: - - "8082:8082" - environment: - KAFKA_REST_LISTENERS: http://0.0.0.0:8082/ - KAFKA_REST_SCHEMA_REGISTRY_URL: http://schema-registry:8081/ - KAFKA_REST_HOST_NAME: kafka-rest-proxy - KAFKA_REST_BOOTSTRAP_SERVERS: PLAINTEXT://broker:29092 - depends_on: - - zookeeper - - broker - - schema-registry - - kafka-topics-ui: - image: landoop/kafka-topics-ui:0.9.4 - hostname: kafka-topics-ui - ports: - - "18000:8000" - environment: - KAFKA_REST_PROXY_URL: "http://kafka-rest-proxy:8082/" - PROXY: "true" - depends_on: - - zookeeper - - broker - - schema-registry - - kafka-rest-proxy - - schema-registry: - image: confluentinc/cp-schema-registry:5.4.0 - hostname: schema-registry - container_name: schema-registry - depends_on: - - zookeeper - - broker - ports: - - "8081:8081" - environment: - SCHEMA_REGISTRY_HOST_NAME: schema-registry - SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181' - - schema-registry-ui: - image: landoop/schema-registry-ui:latest - container_name: schema-registry-ui - hostname: schema-registry-ui - ports: - - "8000:8000" - environment: - SCHEMAREGISTRY_URL: 'http://schema-registry:8081' - ALLOW_GLOBAL: 'true' - ALLOW_TRANSITIVE: 'true' - ALLOW_DELETION: 'true' - READONLY_MODE: 'true' - PROXY: 'true' - depends_on: - - schema-registry - -networks: - default: - name: datahub_network diff --git a/docker/kibana/env/docker.env b/docker/kibana/env/docker.env new file mode 100644 index 0000000000..bbafd8a3e1 --- /dev/null +++ b/docker/kibana/env/docker.env @@ -0,0 +1,2 @@ +SERVER_HOST=0.0.0.0 +ELASTICSEARCH_URL=http://elasticsearch:9200 diff --git a/docker/mae-consumer/Dockerfile b/docker/mae-consumer/Dockerfile deleted file mode 100644 index 63541e2a35..0000000000 --- a/docker/mae-consumer/Dockerfile +++ /dev/null @@ -1,19 +0,0 @@ -FROM openjdk:8 as builder - -COPY . datahub-src -RUN cd datahub-src && ./gradlew :metadata-jobs:mae-consumer-job:build \ - && cp metadata-jobs/mae-consumer-job/build/libs/mae-consumer-job.jar ../mae-consumer-job.jar \ - && cd .. && rm -rf datahub-src - -FROM openjdk:8-jre-alpine -ENV DOCKERIZE_VERSION v0.6.1 -RUN apk --no-cache add curl tar \ - && curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv - -COPY --from=builder /mae-consumer-job.jar /mae-consumer-job.jar -COPY docker/mae-consumer/start.sh /start.sh -RUN chmod +x /start.sh - -EXPOSE 9091 - -CMD /start.sh \ No newline at end of file diff --git a/docker/mae-consumer/README.md b/docker/mae-consumer/README.md deleted file mode 100644 index c37978290d..0000000000 --- a/docker/mae-consumer/README.md +++ /dev/null @@ -1,42 +0,0 @@ -# DataHub MetadataAuditEvent (MAE) Consumer Docker Image -[![datahub-mae-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mae-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mae-consumer+docker%22) - -Refer to [DataHub MAE Consumer Job](../../metadata-jobs/mae-consumer-job) to have a quick understanding of the architecture and -responsibility of this service for the DataHub. - -## Build & Run -``` -cd docker/mae-consumer && docker-compose up --build -``` -This command will rebuild the docker image and start a container based on the image. - -To start a container using a previously built image, run the same command without the `--build` flag. - -### Container configuration - -#### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` - -#### Elasticsearch and Kafka Containers -Before starting `datahub-mae-consumer` container, `elasticsearch` and `kafka` containers should already be up and running. -These connections are configured via environment variables in `docker-compose.yml`: -``` -environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 -``` -The value of `KAFKA_BOOTSTRAP_SERVER` variable should be set to the host name of the `kafka broker` container within the Docker network. -The value of `KAFKA_SCHEMAREGISTRY_URL` variable should be set to the host name of the `kafka schema registry` container within the Docker network. - -``` -environment: - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 -``` -The value of `ELASTICSEARCH_HOST` variable should be set to the host name of the `elasticsearch` container within the Docker network. \ No newline at end of file diff --git a/docker/mae-consumer/docker-compose.yml b/docker/mae-consumer/docker-compose.yml deleted file mode 100644 index aa5f7342a1..0000000000 --- a/docker/mae-consumer/docker-compose.yml +++ /dev/null @@ -1,25 +0,0 @@ ---- -version: '3.5' -services: - datahub-mae-consumer: - image: linkedin/datahub-mae-consumer:${DATAHUB_VERSION:-latest} - build: - context: ../../ - dockerfile: docker/mae-consumer/Dockerfile - hostname: datahub-mae-consumer - container_name: datahub-mae-consumer - ports: - - "9091:9091" - environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - -networks: - default: - name: datahub_network diff --git a/docker/mariadb/README.md b/docker/mariadb/README.md index 10ae6bcab7..efdc10a18d 100644 --- a/docker/mariadb/README.md +++ b/docker/mariadb/README.md @@ -4,36 +4,3 @@ DataHub GMS can use MariaDB as an alternate storage backend. [Official MariaDB Docker image](https://hub.docker.com/_/mariadb) found in Docker Hub is used without any modification. - -## Run Docker container -Below command will start the MariaDB container. -``` -cd docker/mariadb && docker-compose pull && docker-compose up -``` - -An initialization script [init.sql](init.sql) is provided to container. This script initializes `metadata-aspect` table -which is basically the Key-Value store of the DataHub GMS. - -To connect to MariaDB container, you can type below command: -``` -docker exec -it mariadb mysql -u datahub -pdatahub datahub -``` - -## Container configuration -### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - '3306:3306' -``` - -### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` \ No newline at end of file diff --git a/docker/mariadb/docker-compose.yml b/docker/mariadb/docker-compose.mariadb.yml similarity index 54% rename from docker/mariadb/docker-compose.yml rename to docker/mariadb/docker-compose.mariadb.yml index 179b565cc4..6c14759107 100644 --- a/docker/mariadb/docker-compose.yml +++ b/docker/mariadb/docker-compose.mariadb.yml @@ -1,21 +1,23 @@ +# Override to use MariaDB as a backing store for datahub-gms. --- -version: '3.5' +version: '3.8' services: - mysql: + mariadb: container_name: mariadb hostname: mariadb image: mariadb:10.5 + env_file: env/docker.env restart: always - environment: - MYSQL_DATABASE: 'datahub' - MYSQL_USER: 'datahub' - MYSQL_PASSWORD: 'datahub' - MYSQL_ROOT_PASSWORD: 'datahub' ports: - '3306:3306' volumes: - ./init.sql:/docker-entrypoint-initdb.d/init.sql + datahub-gms: + env_file: ../datahub-gms/env/dev.mariadb.env + depends_on: + - mariadb + networks: default: name: datahub_network \ No newline at end of file diff --git a/docker/mce-consumer/Dockerfile b/docker/mce-consumer/Dockerfile deleted file mode 100644 index 6a9de0b663..0000000000 --- a/docker/mce-consumer/Dockerfile +++ /dev/null @@ -1,19 +0,0 @@ -FROM openjdk:8 as builder - -COPY . datahub-src -RUN cd datahub-src && ./gradlew :metadata-jobs:mce-consumer-job:build \ - && cp metadata-jobs/mce-consumer-job/build/libs/mce-consumer-job.jar ../mce-consumer-job.jar \ - && cd .. && rm -rf datahub-src - -FROM openjdk:8-jre-alpine -ENV DOCKERIZE_VERSION v0.6.1 -RUN apk --no-cache add curl tar \ - && curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv - -COPY --from=builder /mce-consumer-job.jar /mce-consumer-job.jar -COPY docker/mce-consumer/start.sh /start.sh -RUN chmod +x /start.sh - -EXPOSE 9090 - -CMD /start.sh \ No newline at end of file diff --git a/docker/mce-consumer/README.md b/docker/mce-consumer/README.md deleted file mode 100644 index 7eebc7280a..0000000000 --- a/docker/mce-consumer/README.md +++ /dev/null @@ -1,42 +0,0 @@ -# DataHub MetadataChangeEvent (MCE) Consumer Docker Image -[![datahub-mce-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mce-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mce-consumer+docker%22) - -Refer to [DataHub MCE Consumer Job](../../metadata-jobs/mce-consumer-job) to have a quick understanding of the architecture and -responsibility of this service for the DataHub. - -## Build & Run -``` -cd docker/mce-consumer && docker-compose up --build -``` -This command will rebuild the docker image and start a container based on the image. - -To start a container using a previously built image, run the same command without the `--build` flag. - -### Container configuration - -#### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` - -#### Kafka and DataHub GMS Containers -Before starting `datahub-mce-consumer` container, `datahub-gms` and `kafka` containers should already be up and running. -These connections are configured via environment variables in `docker-compose.yml`: -``` -environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 -``` -The value of `KAFKA_BOOTSTRAP_SERVER` variable should be set to the host name of the `kafka broker` container within the Docker network. -The value of `KAFKA_SCHEMAREGISTRY_URL` variable should be set to the host name of the `kafka schema registry` container within the Docker network. - -``` -environment: - - GMS_HOST=datahub-gms - - GMS_PORT=8080 -``` -The value of `GMS_HOST` variable should be set to the host name of the `datahub-gms` container within the Docker network. \ No newline at end of file diff --git a/docker/mce-consumer/docker-compose.yml b/docker/mce-consumer/docker-compose.yml deleted file mode 100644 index 07c83d6ccb..0000000000 --- a/docker/mce-consumer/docker-compose.yml +++ /dev/null @@ -1,23 +0,0 @@ ---- -version: '3.5' -services: - datahub-mce-consumer: - image: linkedin/datahub-mce-consumer:${DATAHUB_VERSION:-latest} - build: - context: ../../ - dockerfile: docker/mce-consumer/Dockerfile - hostname: datahub-mce-consumer - container_name: datahub-mce-consumer - ports: - - "9090:9090" - environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - GMS_HOST=datahub-gms - - GMS_PORT=8080 - - KAFKA_MCE_TOPIC_NAME=MetadataChangeEvent - - KAFKA_FMCE_TOPIC_NAME=FailedMetadataChangeEvent - -networks: - default: - name: datahub_network diff --git a/docker/mysql/README.md b/docker/mysql/README.md index 69eef38c0a..9b6d7088ff 100644 --- a/docker/mysql/README.md +++ b/docker/mysql/README.md @@ -4,36 +4,3 @@ DataHub GMS uses MySQL as the storage backend. [Official MySQL Docker image](https://hub.docker.com/_/mysql) found in Docker Hub is used without any modification. - -## Run Docker container -Below command will start the MySQL container. -``` -cd docker/mysql && docker-compose pull && docker-compose up -``` - -An initialization script [init.sql](init.sql) is provided to container. This script initializes `metadata-aspect` table -which is basically the Key-Value store of the DataHub GMS. - -To connect to MySQL container, you can type below command: -``` -docker exec -it mysql mysql -u datahub -pdatahub datahub -``` - -## Container configuration -### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - '3306:3306' -``` - -### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` \ No newline at end of file diff --git a/docker/mysql/docker-compose.mysql.yml b/docker/mysql/docker-compose.mysql.yml new file mode 100644 index 0000000000..68c72d50e3 --- /dev/null +++ b/docker/mysql/docker-compose.mysql.yml @@ -0,0 +1,24 @@ +# Override to use MySQL as a backing store for datahub-gms. +--- +version: '3.8' +services: + mysql: + container_name: mysql + hostname: mysql + image: mysql:5.7 + env_file: env/docker.env + restart: always + command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci + ports: + - "3306:3306" + volumes: + - ./init.sql:/docker-entrypoint-initdb.d/init.sql + - mysqldata:/var/lib/mysql + + datahub-gms: + env_file: ../datahub-gms/env/docker.env + depends_on: + - mysql + +volumes: + mysqldata: diff --git a/docker/mysql/docker-compose.yml b/docker/mysql/docker-compose.yml deleted file mode 100644 index 9b64f84041..0000000000 --- a/docker/mysql/docker-compose.yml +++ /dev/null @@ -1,21 +0,0 @@ ---- -version: '3.5' -services: - mysql: - container_name: mysql - hostname: mysql - image: mysql:5.7 - restart: always - environment: - MYSQL_DATABASE: 'datahub' - MYSQL_USER: 'datahub' - MYSQL_PASSWORD: 'datahub' - MYSQL_ROOT_PASSWORD: 'datahub' - ports: - - '3306:3306' - volumes: - - ./init.sql:/docker-entrypoint-initdb.d/init.sql - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/mysql/env/docker.env b/docker/mysql/env/docker.env new file mode 100644 index 0000000000..72e3e2155d --- /dev/null +++ b/docker/mysql/env/docker.env @@ -0,0 +1,4 @@ +MYSQL_DATABASE=datahub +MYSQL_USER=datahub +MYSQL_PASSWORD=datahub +MYSQL_ROOT_PASSWORD=datahub diff --git a/docker/neo4j/README.md b/docker/neo4j/README.md index 560f3e522b..b0b9f486d9 100644 --- a/docker/neo4j/README.md +++ b/docker/neo4j/README.md @@ -4,32 +4,6 @@ DataHub uses Neo4j as graph db in the backend to serve graph queries. [Official Neo4j image](https://hub.docker.com/_/neo4j) found in Docker Hub is used without any modification. -## Run Docker container -Below command will start all Neo4j container. -``` -cd docker/neo4j && docker-compose pull && docker-compose up -``` - -## Container configuration -### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - "7474:7474" - - "7687:7687" -``` - -### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change it for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` - ## Neo4j Browser To be able to debug and run Cypher queries against your Neo4j image, you can open up `Neo4j Browser` which is running at [http://localhost:7474/browser/](http://localhost:7474/browser/). Default username is `neo4j` and password is `datahub`. \ No newline at end of file diff --git a/docker/neo4j/docker-compose.yml b/docker/neo4j/docker-compose.yml deleted file mode 100644 index 23c4810749..0000000000 --- a/docker/neo4j/docker-compose.yml +++ /dev/null @@ -1,16 +0,0 @@ ---- -version: '3.5' -services: - neo4j: - image: neo4j:3.5.7 - hostname: neo4j - container_name: neo4j - environment: - NEO4J_AUTH: 'neo4j/datahub' - ports: - - "7474:7474" - - "7687:7687" - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/neo4j/env/docker.env b/docker/neo4j/env/docker.env new file mode 100644 index 0000000000..375035f620 --- /dev/null +++ b/docker/neo4j/env/docker.env @@ -0,0 +1 @@ +NEO4J_AUTH=neo4j/datahub diff --git a/docker/postgres/README.md b/docker/postgres/README.md new file mode 100644 index 0000000000..3cb187af1e --- /dev/null +++ b/docker/postgres/README.md @@ -0,0 +1,6 @@ +# MySQL + +DataHub GMS can use PostgreSQL as an alternate storage backend. + +[Official PostgreSQL Docker image](https://hub.docker.com/_/postgres) found in Docker Hub is used without +any modification. diff --git a/docker/postgresql/docker-compose.yml b/docker/postgres/docker-compose.postgre.yml similarity index 56% rename from docker/postgresql/docker-compose.yml rename to docker/postgres/docker-compose.postgre.yml index db4a814942..b980139e20 100644 --- a/docker/postgresql/docker-compose.yml +++ b/docker/postgres/docker-compose.postgre.yml @@ -1,19 +1,23 @@ +# Override to use PostgreSQL as a backing store for datahub-gms. --- -version: '3.5' +version: '3.8' services: postgres: container_name: postgres hostname: postgres image: postgres:12.3 + env_file: env/docker.env restart: always - environment: - POSTGRES_USER: datahub - POSTGRES_PASSWORD: datahub ports: - '5432:5432' volumes: - ./init.sql:/docker-entrypoint-initdb.d/init.sql + datahub-gms: + env_file: ../datahub-gms/env/dev.postgres.env + depends_on: + - postgres + networks: default: name: datahub_network \ No newline at end of file diff --git a/docker/postgres/env/docker.env b/docker/postgres/env/docker.env new file mode 100644 index 0000000000..f84a2b5635 --- /dev/null +++ b/docker/postgres/env/docker.env @@ -0,0 +1,2 @@ +POSTGRES_USER: datahub +POSTGRES_PASSWORD: datahub diff --git a/docker/postgresql/init.sql b/docker/postgres/init.sql similarity index 100% rename from docker/postgresql/init.sql rename to docker/postgres/init.sql diff --git a/docker/postgresql/README.md b/docker/postgresql/README.md deleted file mode 100644 index c0a5056914..0000000000 --- a/docker/postgresql/README.md +++ /dev/null @@ -1,39 +0,0 @@ -# MySQL - -DataHub GMS can use PostgreSQL as an alternate storage backend. - -[Official PostgreSQL Docker image](https://hub.docker.com/_/postgres) found in Docker Hub is used without -any modification. - -## Run Docker container -Below command will start the MySQL container. -``` -cd docker/postgres && docker-compose pull && docker-compose up -``` - -An initialization script [init.sql](init.sql) is provided to container. This script initializes `metadata-aspect` table -which is basically the Key-Value store of the DataHub GMS. - -To connect to PostgreSQL container, you can type below command: -``` -docker exec -it postgres psql -U datahub -``` - -## Container configuration -### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - '5432:5432' -``` - -### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` \ No newline at end of file diff --git a/docker/quickstart.sh b/docker/quickstart.sh new file mode 100755 index 0000000000..7eb3cac649 --- /dev/null +++ b/docker/quickstart.sh @@ -0,0 +1,7 @@ +#!/bin/bash + +# Quickstarts DataHub by pullinng all images from dockerhub and then running the containers locally. No images are +# built locally. Note: by default this pulls the latest version; you can change this to a specific version by setting +# the DATAHUB_VERSION environment variable. +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" +cd $DIR && docker-compose pull && docker-compose -p datahub up \ No newline at end of file diff --git a/docker/quickstart/README.md b/docker/quickstart/README.md deleted file mode 100644 index 8dabaf0035..0000000000 --- a/docker/quickstart/README.md +++ /dev/null @@ -1,30 +0,0 @@ -# DataHub Quickstart -To start all Docker containers at once, please run below command from project root directory: -```bash -./docker/quickstart/quickstart.sh -``` - -At this point, all containers are ready and DataHub can be considered up and running. Check specific containers guide -for details: -* [Elasticsearch & Kibana](../elasticsearch) -* [DataHub Frontend](../frontend) -* [DataHub GMS](../gms) -* [Kafka, Schema Registry & Zookeeper](../kafka) -* [DataHub MAE Consumer](../mae-consumer) -* [DataHub MCE Consumer](../mce-consumer) -* [MySQL](../mysql) - -From this point on, if you want to be able to sign in to DataHub and see some sample data, please see -[Metadata Ingestion Guide](../../metadata-ingestion) for `bootstrapping DataHub`. - -You can also choose to use a specific versin of DataHub docker images instead of the `latest` by specifying `DATAHUB_VERSION` environment variable. - -## Debugging Containers -If you want to debug containers, you can check container logs: -``` -docker logs <> -``` -Also, you can connect to container shell for further debugging: -``` -docker exec -it <> bash -``` diff --git a/docker/quickstart/docker-compose.yml b/docker/quickstart/docker-compose.yml deleted file mode 100644 index 13f75555a2..0000000000 --- a/docker/quickstart/docker-compose.yml +++ /dev/null @@ -1,260 +0,0 @@ ---- -version: '3.5' -services: - mysql: - container_name: mysql - hostname: mysql - image: mysql:5.7 - restart: always - command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci - environment: - MYSQL_DATABASE: 'datahub' - MYSQL_USER: 'datahub' - MYSQL_PASSWORD: 'datahub' - MYSQL_ROOT_PASSWORD: 'datahub' - ports: - - "3306:3306" - volumes: - - ../mysql/init.sql:/docker-entrypoint-initdb.d/init.sql - - mysqldata:/var/lib/mysql - - zookeeper: - image: confluentinc/cp-zookeeper:5.4.0 - hostname: zookeeper - container_name: zookeeper - ports: - - "2181:2181" - environment: - ZOOKEEPER_CLIENT_PORT: 2181 - ZOOKEEPER_TICK_TIME: 2000 - volumes: - - zkdata:/var/opt/zookeeper - - broker: - image: confluentinc/cp-kafka:5.4.0 - hostname: broker - container_name: broker - depends_on: - - zookeeper - ports: - - "29092:29092" - - "9092:9092" - environment: - KAFKA_BROKER_ID: 1 - KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181' - KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT - KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092 - KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 - KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 - - kafka-rest-proxy: - image: confluentinc/cp-kafka-rest:5.4.0 - hostname: kafka-rest-proxy - container_name: kafka-rest-proxy - ports: - - "8082:8082" - environment: - KAFKA_REST_LISTENERS: http://0.0.0.0:8082/ - KAFKA_REST_SCHEMA_REGISTRY_URL: http://schema-registry:8081/ - KAFKA_REST_HOST_NAME: kafka-rest-proxy - KAFKA_REST_BOOTSTRAP_SERVERS: PLAINTEXT://broker:29092 - depends_on: - - zookeeper - - broker - - schema-registry - - kafka-topics-ui: - image: landoop/kafka-topics-ui:0.9.4 - hostname: kafka-topics-ui - container_name: kafka-topics-ui - ports: - - "18000:8000" - environment: - KAFKA_REST_PROXY_URL: "http://kafka-rest-proxy:8082/" - PROXY: "true" - depends_on: - - zookeeper - - broker - - schema-registry - - kafka-rest-proxy - - # This "container" is a workaround to pre-create topics - kafka-setup: - build: - context: ../kafka - hostname: kafka-setup - container_name: kafka-setup - depends_on: - - broker - - schema-registry - environment: - - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - schema-registry: - image: confluentinc/cp-schema-registry:5.4.0 - hostname: schema-registry - container_name: schema-registry - depends_on: - - zookeeper - - broker - ports: - - "8081:8081" - environment: - SCHEMA_REGISTRY_HOST_NAME: schema-registry - SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181' - - schema-registry-ui: - image: landoop/schema-registry-ui:latest - container_name: schema-registry-ui - hostname: schema-registry-ui - ports: - - "8000:8000" - environment: - SCHEMAREGISTRY_URL: 'http://schema-registry:8081' - ALLOW_GLOBAL: 'true' - ALLOW_TRANSITIVE: 'true' - ALLOW_DELETION: 'true' - READONLY_MODE: 'true' - PROXY: 'true' - depends_on: - - schema-registry - - elasticsearch: - image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8 - container_name: elasticsearch - hostname: elasticsearch - ports: - - "9200:9200" - environment: - - discovery.type=single-node - - xpack.security.enabled=false - - "ES_JAVA_OPTS=-Xms1g -Xmx1g" - volumes: - - esdata:/usr/share/elasticsearch/data - - kibana: - image: docker.elastic.co/kibana/kibana:5.6.8 - container_name: kibana - hostname: kibana - ports: - - "5601:5601" - environment: - - SERVER_HOST=0.0.0.0 - - ELASTICSEARCH_URL=http://elasticsearch:9200 - depends_on: - - elasticsearch - - neo4j: - image: neo4j:3.5.7 - hostname: neo4j - container_name: neo4j - environment: - NEO4J_AUTH: 'neo4j/datahub' - ports: - - "7474:7474" - - "7687:7687" - volumes: - - neo4jdata:/data - - # This "container" is a workaround to pre-create search indices - elasticsearch-setup: - build: - context: ../elasticsearch - hostname: elasticsearch-setup - container_name: elasticsearch-setup - depends_on: - - elasticsearch - environment: - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - datahub-gms: - image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest} - hostname: datahub-gms - container_name: datahub-gms - ports: - - "8080:8080" - environment: - - EBEAN_DATASOURCE_USERNAME=datahub - - EBEAN_DATASOURCE_PASSWORD=datahub - - EBEAN_DATASOURCE_HOST=mysql:3306 - - EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8 - - EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - depends_on: - - elasticsearch-setup - - kafka-setup - - mysql - - neo4j - - datahub-frontend: - image: linkedin/datahub-frontend:${DATAHUB_VERSION:-latest} - hostname: datahub-frontend - container_name: datahub-frontend - ports: - - "9001:9001" - environment: - - DATAHUB_GMS_HOST=datahub-gms - - DATAHUB_GMS_PORT=8080 - - DATAHUB_SECRET=YouKnowNothing - - DATAHUB_APP_VERSION=1.0 - - DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB - depends_on: - - datahub-gms - - datahub-mae-consumer: - image: linkedin/datahub-mae-consumer:${DATAHUB_VERSION:-latest} - hostname: datahub-mae-consumer - container_name: datahub-mae-consumer - ports: - - "9091:9091" - environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - depends_on: - - kafka-setup - - elasticsearch-setup - - neo4j - command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \ - echo kafka-setup done! && /start.sh'" - - datahub-mce-consumer: - image: linkedin/datahub-mce-consumer:${DATAHUB_VERSION:-latest} - hostname: datahub-mce-consumer - container_name: datahub-mce-consumer - ports: - - "9090:9090" - environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - GMS_HOST=datahub-gms - - GMS_PORT=8080 - depends_on: - - kafka-setup - - datahub-gms - command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \ - echo kafka-setup done! && /start.sh'" - -networks: - default: - name: datahub_network - -volumes: - mysqldata: - esdata: - neo4jdata: - zkdata: diff --git a/docker/quickstart/quickstart.sh b/docker/quickstart/quickstart.sh deleted file mode 100755 index 7f4798e777..0000000000 --- a/docker/quickstart/quickstart.sh +++ /dev/null @@ -1,4 +0,0 @@ -#!/bin/bash - -DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" -cd $DIR && docker-compose pull && docker-compose -p datahub up --build \ No newline at end of file diff --git a/docker/rebuild-all/docker-compose.yml b/docker/rebuild-all/docker-compose.yml deleted file mode 100644 index be0762c1c7..0000000000 --- a/docker/rebuild-all/docker-compose.yml +++ /dev/null @@ -1,268 +0,0 @@ ---- -version: '3.5' -services: - mysql: - container_name: mysql - hostname: mysql - image: mysql:5.7 - restart: always - command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci - environment: - MYSQL_DATABASE: 'datahub' - MYSQL_USER: 'datahub' - MYSQL_PASSWORD: 'datahub' - MYSQL_ROOT_PASSWORD: 'datahub' - ports: - - "3306:3306" - volumes: - - ../mysql/init.sql:/docker-entrypoint-initdb.d/init.sql - - mysqldata:/var/lib/mysql - - zookeeper: - image: confluentinc/cp-zookeeper:5.4.0 - hostname: zookeeper - container_name: zookeeper - ports: - - "2181:2181" - environment: - ZOOKEEPER_CLIENT_PORT: 2181 - ZOOKEEPER_TICK_TIME: 2000 - volumes: - - zkdata:/var/opt/zookeeper - - broker: - image: confluentinc/cp-kafka:5.4.0 - hostname: broker - container_name: broker - depends_on: - - zookeeper - ports: - - "29092:29092" - - "9092:9092" - environment: - KAFKA_BROKER_ID: 1 - KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181' - KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT - KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092 - KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 - KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 - - kafka-rest-proxy: - image: confluentinc/cp-kafka-rest:5.4.0 - hostname: kafka-rest-proxy - container_name: kafka-rest-proxy - ports: - - "8082:8082" - environment: - KAFKA_REST_LISTENERS: http://0.0.0.0:8082/ - KAFKA_REST_SCHEMA_REGISTRY_URL: http://schema-registry:8081/ - KAFKA_REST_HOST_NAME: kafka-rest-proxy - KAFKA_REST_BOOTSTRAP_SERVERS: PLAINTEXT://broker:29092 - depends_on: - - zookeeper - - broker - - schema-registry - - kafka-topics-ui: - image: landoop/kafka-topics-ui:0.9.4 - hostname: kafka-topics-ui - container_name: kafka-topics-ui - ports: - - "18000:8000" - environment: - KAFKA_REST_PROXY_URL: "http://kafka-rest-proxy:8082/" - PROXY: "true" - depends_on: - - zookeeper - - broker - - schema-registry - - kafka-rest-proxy - - # This "container" is a workaround to pre-create topics - kafka-setup: - build: - context: ../kafka - hostname: kafka-setup - container_name: kafka-setup - depends_on: - - broker - - schema-registry - environment: - - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - schema-registry: - image: confluentinc/cp-schema-registry:5.4.0 - hostname: schema-registry - container_name: schema-registry - depends_on: - - zookeeper - - broker - ports: - - "8081:8081" - environment: - SCHEMA_REGISTRY_HOST_NAME: schema-registry - SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181' - - schema-registry-ui: - image: landoop/schema-registry-ui:latest - container_name: schema-registry-ui - hostname: schema-registry-ui - ports: - - "8000:8000" - environment: - SCHEMAREGISTRY_URL: 'http://schema-registry:8081' - ALLOW_GLOBAL: 'true' - ALLOW_TRANSITIVE: 'true' - ALLOW_DELETION: 'true' - READONLY_MODE: 'true' - PROXY: 'true' - depends_on: - - schema-registry - - elasticsearch: - image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8 - container_name: elasticsearch - hostname: elasticsearch - ports: - - "9200:9200" - environment: - - discovery.type=single-node - - xpack.security.enabled=false - - "ES_JAVA_OPTS=-Xms1g -Xmx1g" - volumes: - - esdata:/usr/share/elasticsearch/data - - kibana: - image: docker.elastic.co/kibana/kibana:5.6.8 - container_name: kibana - hostname: kibana - ports: - - "5601:5601" - environment: - - SERVER_HOST=0.0.0.0 - - ELASTICSEARCH_URL=http://elasticsearch:9200 - depends_on: - - elasticsearch - - neo4j: - image: neo4j:3.5.7 - hostname: neo4j - container_name: neo4j - environment: - NEO4J_AUTH: 'neo4j/datahub' - ports: - - "7474:7474" - - "7687:7687" - volumes: - - neo4jdata:/data - - # This "container" is a workaround to pre-create search indices - elasticsearch-setup: - build: - context: ../elasticsearch - hostname: elasticsearch-setup - container_name: elasticsearch-setup - depends_on: - - elasticsearch - environment: - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - datahub-gms: - build: - context: ../../ - dockerfile: docker/gms/Dockerfile - hostname: datahub-gms - container_name: datahub-gms - ports: - - "8080:8080" - environment: - - EBEAN_DATASOURCE_USERNAME=datahub - - EBEAN_DATASOURCE_PASSWORD=datahub - - EBEAN_DATASOURCE_HOST=mysql:3306 - - EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8 - - EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - depends_on: - - elasticsearch-setup - - kafka-setup - - mysql - - neo4j - - datahub-frontend: - build: - context: ../../ - dockerfile: docker/frontend/Dockerfile - hostname: datahub-frontend - container_name: datahub-frontend - ports: - - "9001:9001" - environment: - - DATAHUB_GMS_HOST=datahub-gms - - DATAHUB_GMS_PORT=8080 - - DATAHUB_SECRET=YouKnowNothing - - DATAHUB_APP_VERSION=1.0 - - DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB - depends_on: - - datahub-gms - - datahub-mae-consumer: - build: - context: ../../ - dockerfile: docker/mae-consumer/Dockerfile - hostname: datahub-mae-consumer - container_name: datahub-mae-consumer - ports: - - "9091:9091" - environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - depends_on: - - kafka-setup - - elasticsearch-setup - - neo4j - command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \ - echo kafka-setup done! && /start.sh'" - - datahub-mce-consumer: - build: - context: ../../ - dockerfile: docker/mce-consumer/Dockerfile - hostname: datahub-mce-consumer - container_name: datahub-mce-consumer - ports: - - "9090:9090" - environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - GMS_HOST=datahub-gms - - GMS_PORT=8080 - depends_on: - - kafka-setup - - datahub-gms - command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \ - echo kafka-setup done! && /start.sh'" - -networks: - default: - name: datahub_network - -volumes: - mysqldata: - esdata: - neo4jdata: - zkdata: diff --git a/docker/rebuild-all/rebuild-all.sh b/docker/rebuild-all/rebuild-all.sh deleted file mode 100755 index 552c9d76fc..0000000000 --- a/docker/rebuild-all/rebuild-all.sh +++ /dev/null @@ -1,4 +0,0 @@ -#!/bin/bash - -DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" -cd $DIR && docker-compose pull && docker-compose -p datahub up \ No newline at end of file diff --git a/docker/schema-registry-ui/env/docker.env b/docker/schema-registry-ui/env/docker.env new file mode 100644 index 0000000000..6e40a79fe7 --- /dev/null +++ b/docker/schema-registry-ui/env/docker.env @@ -0,0 +1,6 @@ +SCHEMAREGISTRY_URL=http://schema-registry:8081 +ALLOW_GLOBAL=true +ALLOW_TRANSITIVE=true +ALLOW_DELETION=true +READONLY_MODE=true +PROXY=true diff --git a/docker/schema-registry/env/docker.env b/docker/schema-registry/env/docker.env new file mode 100644 index 0000000000..166c551ac1 --- /dev/null +++ b/docker/schema-registry/env/docker.env @@ -0,0 +1,2 @@ +SCHEMA_REGISTRY_HOST_NAME=schemaregistry +SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=zookeeper:2181 diff --git a/docker/zookeeper/env/docker.env b/docker/zookeeper/env/docker.env new file mode 100644 index 0000000000..1b8f605f98 --- /dev/null +++ b/docker/zookeeper/env/docker.env @@ -0,0 +1,2 @@ +ZOOKEEPER_CLIENT_PORT=2181 +ZOOKEEPER_TICK_TIME=2000 diff --git a/docs/docker/README.md b/docs/docker/README.md new file mode 100644 index 0000000000..4a99490a2b --- /dev/null +++ b/docs/docker/README.md @@ -0,0 +1 @@ +See [docker/README.md](../../docker/README.md). \ No newline at end of file diff --git a/docs/docker/development.md b/docs/docker/development.md new file mode 100644 index 0000000000..2122a8e37f --- /dev/null +++ b/docs/docker/development.md @@ -0,0 +1,70 @@ +# Using Docker Images During Development + +We've created a special `docker-compose.dev.yml` override file that should configure docker images to be easier to use +during development. + +Normally, you'd rebuild your images from scratch with `docker-compose build` (or `docker-compose up --build`). However, +this takes way too long for development. It has to copy the entire repo to each image and rebuild it there. + +The `docker-compose.dev.yml` file bypasses this problem by mounting binaries, startup scripts, and other data to +special, slimmed down images (of which the Dockerfile is usually defined in `/debug/Dockerfile`). Mounts work +both ways, so they should also try to mount log directories on the container, so that they are easy to read on your +local machine without needing to inspect the running container (especially if the app crashes and the container stops!). + +We highly recommend you just invoke the `docker/dev.sh` script we've included. It is pretty small if you want to read it +to see what it does, but it ends up using our `docker-compose.dev.yml` file. + +## Debugging + +The default dev images, while set up to use your local code, do not enable debugging by default. To enable debugging, +you need to make two small edits (don't check these changes in!). + +- Add the JVM debug flags to the environment file for the service. +- Assign the port in the docker-compose file. + +For example, to debug `dathaub-gms`: + +``` +# Add this line to docker/datahub-gms/env/dev.env. You can change the port and/or change suspend=n to y. +JAVA_TOOL_OPTIONS -agentlib:jdwp=transport=dt_socket,address=5005,server=y,suspend=n +``` + +``` +# Change the definition in docker/docker-compose.dev.yml to this + datahub-gms: + image: linkedin/datahub-gms:debug + build: + context: datahub-gms/debug + dockerfile: Dockerfile + ports: # <--- Add this line + - "5005:5005" # <--- And this line. Must match port from environment file. + volumes: + - ./datahub-gms/start.sh:/datahub/datahub-gms/scripts/start.sh + - ../gms/war/build/libs/:/datahub/datahub-gms/bin +``` + +## Tips for People New To Docker + +### Conflicting containers + +If you ran `docker/quickstart.sh` before, your machine may already have a container for DataHub. If you want to run +`docker/dev.sh` instead, ensure that the old container is removed by running `docker container prune`. The opposite also +applies. + +> Note this only removes containers, not images. Should still be fast to switch between these once you've launched both +> at least once. + +### Running a specific service + +`docker-compose up` will launch all services in the configuration, including dependencies, unless they're already +running. If you, for some reason, wish to change this behavior, check out these example commands. + +``` +docker-compose -p datahub -f docker-compose.yml -f docker-compose.overrides.yml -f docker-compose.dev.yml up datahub-gms +``` +Will only start `datahub-gms` and its dependencies. + +``` +docker-compose -p datahub -f docker-compose.yml -f docker-compose.overrides.yml -f docker-compose.dev.yml up --no-deps datahub-gms +``` +Will only start `datahub-gms`, without dependencies.