diff --git a/.github/workflows/docker-frontend.yml b/.github/workflows/docker-frontend.yml index fdc7114d75..ab9c4fdd73 100644 --- a/.github/workflows/docker-frontend.yml +++ b/.github/workflows/docker-frontend.yml @@ -21,6 +21,8 @@ jobs: echo "tag=$TAG" echo "::set-output name=tag::$TAG" - uses: docker/build-push-action@v1 + env: + DOCKER_BUILDKIT: 1 with: dockerfile: ./docker/frontend/Dockerfile username: ${{ secrets.DOCKER_USERNAME }} diff --git a/.github/workflows/docker-gms.yml b/.github/workflows/docker-gms.yml index 6f4e8de8ad..dd3e1bdcaf 100644 --- a/.github/workflows/docker-gms.yml +++ b/.github/workflows/docker-gms.yml @@ -21,6 +21,8 @@ jobs: echo "tag=$TAG" echo "::set-output name=tag::$TAG" - uses: docker/build-push-action@v1 + env: + DOCKER_BUILDKIT: 1 with: dockerfile: ./docker/gms/Dockerfile username: ${{ secrets.DOCKER_USERNAME }} diff --git a/.github/workflows/docker-mae-consumer.yml b/.github/workflows/docker-mae-consumer.yml index e5321a448c..c45b7085e7 100644 --- a/.github/workflows/docker-mae-consumer.yml +++ b/.github/workflows/docker-mae-consumer.yml @@ -21,6 +21,8 @@ jobs: echo "tag=$TAG" echo "::set-output name=tag::$TAG" - uses: docker/build-push-action@v1 + env: + DOCKER_BUILDKIT: 1 with: dockerfile: ./docker/mae-consumer/Dockerfile username: ${{ secrets.DOCKER_USERNAME }} diff --git a/.github/workflows/docker-mce-consumer.yml b/.github/workflows/docker-mce-consumer.yml index b9fc5d7832..24f342bca8 100644 --- a/.github/workflows/docker-mce-consumer.yml +++ b/.github/workflows/docker-mce-consumer.yml @@ -21,6 +21,8 @@ jobs: echo "tag=$TAG" echo "::set-output name=tag::$TAG" - uses: docker/build-push-action@v1 + env: + DOCKER_BUILDKIT: 1 with: dockerfile: ./docker/mce-consumer/Dockerfile username: ${{ secrets.DOCKER_USERNAME }} diff --git a/.gitignore b/.gitignore index 40d10a8e1c..723f38b219 100644 --- a/.gitignore +++ b/.gitignore @@ -17,11 +17,8 @@ metadata-events/mxe-registration/src/main/resources/**/*.avsc .java-version # Python -.env .venv -env/ venv/ -ENV/ env.bak/ venv.bak/ .mypy_cache/ diff --git a/docker/README.md b/docker/README.md index 4e192a14c5..544f80a26e 100644 --- a/docker/README.md +++ b/docker/README.md @@ -1,27 +1,56 @@ # Docker Images + +## Prerequisites +You need to install [docker](https://docs.docker.com/install/) and +[docker-compose](https://docs.docker.com/compose/install/) (if using Linux; on Windows and Mac compose is included with +Docker Desktop). + +Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap +area. + +## Quickstart + The easiest way to bring up and test DataHub is using DataHub [Docker](https://www.docker.com) images which are continuously deployed to [Docker Hub](https://hub.docker.com/u/linkedin) with every commit to repository. +You can easily download and run all these images and their dependencies with our +[quick start guide](../docs/quickstart.md). + +DataHub Docker Images: + * [linkedin/datahub-gms](https://cloud.docker.com/repository/docker/linkedin/datahub-gms/) * [linkedin/datahub-frontend](https://cloud.docker.com/repository/docker/linkedin/datahub-frontend/) * [linkedin/datahub-mae-consumer](https://cloud.docker.com/repository/docker/linkedin/datahub-mae-consumer/) * [linkedin/datahub-mce-consumer](https://cloud.docker.com/repository/docker/linkedin/datahub-mce-consumer/) -Above Docker images are created for DataHub specific use. You can check subdirectories to check how those images are -generated via [Dockerbuild](https://docs.docker.com/engine/reference/commandline/build/) files or -how to start each container using [Docker Compose](https://docs.docker.com/compose/). Other than these, DataHub depends -on below Docker images to be able to run: +Dependencies: * [**Kafka and Schema Registry**](kafka) -* [**Elasticsearch**](elasticsearch) +* [**Elasticsearch**](elasticsearch-setup) * [**MySQL**](mysql) -Local-built ingestion image allows you to create on an ad-hoc basis `metadatachangeevent` with Python script. -The pipeline depends on all the above images composing up. -* [**Ingestion**](ingestion) +### Ingesting demo data. -## Prerequisites -You need to install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/). +If you want to test ingesting some data once DataHub is up, see [**Ingestion**](ingestion/README.md). -## Quickstart -If you want to quickly try and evaluate DataHub by running all necessary Docker containers, you can check -[Quickstart Guide](quickstart). \ No newline at end of file +## Using Docker Images During Development + +See [Using Docker Images During Development](../docs/docker/development.md). + +## Building And Deploying Docker Images + +We use GitHub actions to build and continuously deploy our images. There should be no need to do this manually; a +successful release on Github will automatically publish the images. + +### Building images + +To build the full images (that we are going to publish), you need to run the following: + +``` +COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub build +``` + +This is because we're relying on builtkit for multistage builds. It does not hurt also set `DATAHUB_VERSION` to +something unique. + +This is not our recommended development flow and most developers should be following the +[Using Docker Images During Development](#using-docker-images-during-development) guide. \ No newline at end of file diff --git a/docker/broker/env/docker.env b/docker/broker/env/docker.env new file mode 100644 index 0000000000..4d8b2b1088 --- /dev/null +++ b/docker/broker/env/docker.env @@ -0,0 +1,6 @@ +KAFKA_BROKER_ID=1 +KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 +KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT +KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092 +KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 +KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0 diff --git a/docker/frontend/Dockerfile b/docker/datahub-frontend/Dockerfile similarity index 100% rename from docker/frontend/Dockerfile rename to docker/datahub-frontend/Dockerfile diff --git a/docker/datahub-frontend/README.md b/docker/datahub-frontend/README.md new file mode 100644 index 0000000000..d04ab880ef --- /dev/null +++ b/docker/datahub-frontend/README.md @@ -0,0 +1,16 @@ +# DataHub Frontend Docker Image + +[![datahub-frontend docker](https://github.com/linkedin/datahub/workflows/datahub-frontend%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-frontend+docker%22) + +Refer to [DataHub Frontend Service](../../datahub-frontend) to have a quick understanding of the architecture and +responsibility of this service for the DataHub. + +## Checking out DataHub UI + +After starting your Docker container, you can connect to it by typing below into your favorite web browser: + +``` +http://localhost:9001 +``` + +You can sign in with `datahub` as username and password. diff --git a/docker/datahub-frontend/env/docker.env b/docker/datahub-frontend/env/docker.env new file mode 100644 index 0000000000..c132d9a0ac --- /dev/null +++ b/docker/datahub-frontend/env/docker.env @@ -0,0 +1,5 @@ +DATAHUB_GMS_HOST=datahub-gms +DATAHUB_GMS_PORT=8080 +DATAHUB_SECRET=YouKnowNothing +DATAHUB_APP_VERSION=1.0 +DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB diff --git a/docker/datahub-gms/Dockerfile b/docker/datahub-gms/Dockerfile new file mode 100644 index 0000000000..4cfa5c08e2 --- /dev/null +++ b/docker/datahub-gms/Dockerfile @@ -0,0 +1,28 @@ +# Defining environment +ARG APP_ENV=prod + +FROM openjdk:8-jre-alpine as base +ENV DOCKERIZE_VERSION v0.6.1 +RUN apk --no-cache add curl tar \ + && curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.20.v20190813/jetty-runner-9.4.20.v20190813.jar --output jetty-runner.jar \ + && curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv + +FROM openjdk:8 as prod-build +COPY . /datahub-src +RUN cd /datahub-src && ./gradlew :gms:war:build +RUN cp /datahub-src/gms/war/build/libs/war.war /war.war + +FROM base as prod-install +COPY --from=prod-build /war.war /datahub/datahub-gms/bin/war.war +COPY --from=prod-build /datahub-src/docker/datahub-gms/start.sh /datahub/datahub-gms/scripts/start.sh +RUN chmod +x /datahub/datahub-gms/scripts/start.sh + +FROM base as dev-install +# Dummy stage for development. Assumes code is built on your machine and mounted to this image. +# See this excellent thread https://github.com/docker/cli/issues/1134 + +FROM ${APP_ENV}-install as final + +EXPOSE 8080 + +CMD /datahub/datahub-gms/scripts/start.sh \ No newline at end of file diff --git a/docker/datahub-gms/README.md b/docker/datahub-gms/README.md new file mode 100644 index 0000000000..78a3159c36 --- /dev/null +++ b/docker/datahub-gms/README.md @@ -0,0 +1,22 @@ +# DataHub Generalized Metadata Store (GMS) Docker Image +[![datahub-gms docker](https://github.com/linkedin/datahub/workflows/datahub-gms%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-gms+docker%22) + +Refer to [DataHub GMS Service](../../gms) to have a quick understanding of the architecture and +responsibility of this service for the DataHub. + +## Other Database Platforms + +While GMS defaults to using MySQL as its storage backend, it is possible to switch to any of the +[database platforms](https://ebean.io/docs/database/) supported by Ebean. + +For example, you can run the following command to start a GMS that connects to a PostgreSQL backend. + +``` +(cd docker/ && docker-compose -f docker-compose.yml -f docker-compose.postgre.yml -p datahub up) +``` + +or a MariaDB backend + +``` +(cd docker/ && docker-compose -f docker-compose.yml -f docker-compose.mariadb.yml -p datahub up) +``` diff --git a/docker/datahub-gms/env/docker.env b/docker/datahub-gms/env/docker.env new file mode 100644 index 0000000000..34bb9df0e8 --- /dev/null +++ b/docker/datahub-gms/env/docker.env @@ -0,0 +1,13 @@ +EBEAN_DATASOURCE_USERNAME=datahub +EBEAN_DATASOURCE_PASSWORD=datahub +EBEAN_DATASOURCE_HOST=mysql:3306 +EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8 +EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver +KAFKA_BOOTSTRAP_SERVER=broker:29092 +KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 +ELASTICSEARCH_HOST=elasticsearch +ELASTICSEARCH_PORT=9200 +NEO4J_HOST=neo4j:7474 +NEO4J_URI=bolt://neo4j +NEO4J_USERNAME=neo4j +NEO4J_PASSWORD=datahub diff --git a/docker/datahub-gms/env/docker.mariadb.env b/docker/datahub-gms/env/docker.mariadb.env new file mode 100644 index 0000000000..5abd67a74f --- /dev/null +++ b/docker/datahub-gms/env/docker.mariadb.env @@ -0,0 +1,13 @@ +EBEAN_DATASOURCE_USERNAME=datahub +EBEAN_DATASOURCE_PASSWORD=datahub +EBEAN_DATASOURCE_HOST=mariadb:3306 +EBEAN_DATASOURCE_URL=jdbc:mariadb://mariadb:3306/datahub +EBEAN_DATASOURCE_DRIVER=org.mariadb.jdbc.Driver +KAFKA_BOOTSTRAP_SERVER=broker:29092 +KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 +ELASTICSEARCH_HOST=elasticsearch +ELASTICSEARCH_PORT=9200 +NEO4J_HOST=neo4j:7474 +NEO4J_URI=bolt://neo4j +NEO4J_USERNAME=neo4j +NEO4J_PASSWORD=datahub diff --git a/docker/datahub-gms/env/docker.postgres.env b/docker/datahub-gms/env/docker.postgres.env new file mode 100644 index 0000000000..bff43aaa01 --- /dev/null +++ b/docker/datahub-gms/env/docker.postgres.env @@ -0,0 +1,13 @@ +EBEAN_DATASOURCE_USERNAME=datahub +EBEAN_DATASOURCE_PASSWORD=datahub +EBEAN_DATASOURCE_HOST=postgres:5432 +EBEAN_DATASOURCE_URL=jdbc:postgresql://postgres:5432/datahub +EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver +KAFKA_BOOTSTRAP_SERVER=broker:29092 +KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 +ELASTICSEARCH_HOST=elasticsearch +ELASTICSEARCH_PORT=9200 +NEO4J_HOST=neo4j:7474 +NEO4J_URI=bolt://neo4j +NEO4J_USERNAME=neo4j +NEO4J_PASSWORD=datahub diff --git a/docker/gms/start.sh b/docker/datahub-gms/start.sh old mode 100644 new mode 100755 similarity index 76% rename from docker/gms/start.sh rename to docker/datahub-gms/start.sh index c6edab4551..37e81941f8 --- a/docker/gms/start.sh +++ b/docker/datahub-gms/start.sh @@ -6,4 +6,4 @@ dockerize \ -wait http://$ELASTICSEARCH_HOST:$ELASTICSEARCH_PORT \ -wait http://$NEO4J_HOST \ -timeout 240s \ - java -jar jetty-runner.jar gms.war \ No newline at end of file + java -jar /jetty-runner.jar /datahub/datahub-gms/bin/war.war \ No newline at end of file diff --git a/docker/datahub-mae-consumer/Dockerfile b/docker/datahub-mae-consumer/Dockerfile new file mode 100644 index 0000000000..33558e7d57 --- /dev/null +++ b/docker/datahub-mae-consumer/Dockerfile @@ -0,0 +1,27 @@ +# Defining environment +ARG APP_ENV=prod + +FROM openjdk:8-jre-alpine as base +ENV DOCKERIZE_VERSION v0.6.1 +RUN apk --no-cache add curl tar \ + && curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv + +FROM openjdk:8 as prod-build +COPY . datahub-src +RUN cd datahub-src && ./gradlew :metadata-jobs:mae-consumer-job:build +RUN cd datahub-src && cp metadata-jobs/mae-consumer-job/build/libs/mae-consumer-job.jar ../mae-consumer-job.jar + +FROM base as prod-install +COPY --from=prod-build /mae-consumer-job.jar /datahub/datahub-mae-consumer/bin/ +COPY --from=prod-build /datahub-src/docker/datahub-mae-consumer/start.sh /datahub/datahub-mae-consumer/scripts/ +RUN chmod +x /datahub/datahub-mae-consumer/scripts/start.sh + +FROM base as dev-install +# Dummy stage for development. Assumes code is built on your machine and mounted to this image. +# See this excellent thread https://github.com/docker/cli/issues/1134 + +FROM ${APP_ENV}-install as final + +EXPOSE 9090 + +CMD /datahub/datahub-mae-consumer/scripts/start.sh \ No newline at end of file diff --git a/docker/datahub-mae-consumer/README.md b/docker/datahub-mae-consumer/README.md new file mode 100644 index 0000000000..e95e875440 --- /dev/null +++ b/docker/datahub-mae-consumer/README.md @@ -0,0 +1,5 @@ +# DataHub MetadataAuditEvent (MAE) Consumer Docker Image +[![datahub-mae-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mae-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mae-consumer+docker%22) + +Refer to [DataHub MAE Consumer Job](../../metadata-jobs/mae-consumer-job) to have a quick understanding of the architecture and +responsibility of this service for the DataHub. diff --git a/docker/datahub-mae-consumer/env/docker.env b/docker/datahub-mae-consumer/env/docker.env new file mode 100644 index 0000000000..5788020a95 --- /dev/null +++ b/docker/datahub-mae-consumer/env/docker.env @@ -0,0 +1,8 @@ +KAFKA_BOOTSTRAP_SERVER=broker:29092 +KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 +ELASTICSEARCH_HOST=elasticsearch +ELASTICSEARCH_PORT=9200 +NEO4J_HOST=neo4j:7474 +NEO4J_URI=bolt://neo4j +NEO4J_USERNAME=neo4j +NEO4J_PASSWORD=datahub diff --git a/docker/mae-consumer/start.sh b/docker/datahub-mae-consumer/start.sh old mode 100644 new mode 100755 similarity index 71% rename from docker/mae-consumer/start.sh rename to docker/datahub-mae-consumer/start.sh index e06a8b759d..4f7f2837a3 --- a/docker/mae-consumer/start.sh +++ b/docker/datahub-mae-consumer/start.sh @@ -5,4 +5,4 @@ dockerize \ -wait http://$ELASTICSEARCH_HOST:$ELASTICSEARCH_PORT \ -wait http://$NEO4J_HOST \ -timeout 240s \ - java -jar mae-consumer-job.jar \ No newline at end of file + java -jar /datahub/datahub-mae-consumer/bin/mae-consumer-job.jar \ No newline at end of file diff --git a/docker/datahub-mce-consumer/Dockerfile b/docker/datahub-mce-consumer/Dockerfile new file mode 100644 index 0000000000..266d9a684c --- /dev/null +++ b/docker/datahub-mce-consumer/Dockerfile @@ -0,0 +1,27 @@ +# Defining environment +ARG APP_ENV=prod + +FROM openjdk:8-jre-alpine as base +ENV DOCKERIZE_VERSION v0.6.1 +RUN apk --no-cache add curl tar \ + && curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv + +FROM openjdk:8 as prod-build +COPY . datahub-src +RUN cd datahub-src && ./gradlew :metadata-jobs:mce-consumer-job:build +RUN cd datahub-src && cp metadata-jobs/mce-consumer-job/build/libs/mce-consumer-job.jar ../mce-consumer-job.jar + +FROM base as prod-install +COPY --from=prod-build /mce-consumer-job.jar /datahub/datahub-mce-consumer/bin/ +COPY --from=prod-build /datahub-src/docker/datahub-mce-consumer/start.sh /datahub/datahub-mce-consumer/scripts/ +RUN chmod +x /datahub/datahub-mce-consumer/scripts/start.sh + +FROM base as dev-install +# Dummy stage for development. Assumes code is built on your machine and mounted to this image. +# See this excellent thread https://github.com/docker/cli/issues/1134 + +FROM ${APP_ENV}-install as final + +EXPOSE 9090 + +CMD /datahub/datahub-mce-consumer/scripts/start.sh \ No newline at end of file diff --git a/docker/datahub-mce-consumer/README.md b/docker/datahub-mce-consumer/README.md new file mode 100644 index 0000000000..e4a05fe851 --- /dev/null +++ b/docker/datahub-mce-consumer/README.md @@ -0,0 +1,5 @@ +# DataHub MetadataChangeEvent (MCE) Consumer Docker Image +[![datahub-mce-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mce-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mce-consumer+docker%22) + +Refer to [DataHub MCE Consumer Job](../../metadata-jobs/mce-consumer-job) to have a quick understanding of the architecture and +responsibility of this service for the DataHub. diff --git a/docker/datahub-mce-consumer/env/docker.env b/docker/datahub-mce-consumer/env/docker.env new file mode 100644 index 0000000000..59907e8278 --- /dev/null +++ b/docker/datahub-mce-consumer/env/docker.env @@ -0,0 +1,4 @@ +KAFKA_BOOTSTRAP_SERVER=broker:29092 +KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 +GMS_HOST=datahub-gms +GMS_PORT=8080 diff --git a/docker/mce-consumer/start.sh b/docker/datahub-mce-consumer/start.sh old mode 100644 new mode 100755 similarity index 64% rename from docker/mce-consumer/start.sh rename to docker/datahub-mce-consumer/start.sh index 7835873711..302df31d10 --- a/docker/mce-consumer/start.sh +++ b/docker/datahub-mce-consumer/start.sh @@ -4,4 +4,4 @@ dockerize \ -wait tcp://$KAFKA_BOOTSTRAP_SERVER \ -timeout 240s \ - java -jar mce-consumer-job.jar \ No newline at end of file + java -jar /datahub/datahub-mce-consumer/bin/mce-consumer-job.jar \ No newline at end of file diff --git a/docker/dev.sh b/docker/dev.sh new file mode 100755 index 0000000000..1199621f3e --- /dev/null +++ b/docker/dev.sh @@ -0,0 +1,17 @@ +#!/bin/bash + +# Launches dev instances of DataHub images. See documentation for more details. +# YOU MUST BUILD VIA GRADLE BEFORE RUNNING THIS. +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" +cd $DIR && \ + COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose \ + -f docker-compose.yml \ + -f docker-compose.override.yml \ + -f docker-compose.dev.yml \ + pull \ +&& \ + COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub \ + -f docker-compose.yml \ + -f docker-compose.override.yml \ + -f docker-compose.dev.yml \ + up \ No newline at end of file diff --git a/docker/docker-compose.dev.yml b/docker/docker-compose.dev.yml new file mode 100644 index 0000000000..bf6e9c13f2 --- /dev/null +++ b/docker/docker-compose.dev.yml @@ -0,0 +1,45 @@ +# Default overrides for running local development. + +# Images here are made as "development" images by following the general pattern of defining a multistage build with +# separate prod/dev steps; using APP_ENV to specify which to use. The dev steps should avoid building and instead assume +# that binaries and scripts will be mounted to the image, as also set up by this file. Also see see this excellent +# thread https://github.com/docker/cli/issues/1134. + +# To make a JVM app debuggable via IntelliJ, go to its env file and add JVM debug flags, and then add the JVM debug +# port to this file. +--- +# TODO mount + debug docker file for frontend +version: '3.8' +services: + datahub-gms: + image: linkedin/datahub-gms:debug + build: + context: datahub-gms + dockerfile: Dockerfile + args: + APP_ENV: dev + volumes: + - ./datahub-gms/start.sh:/datahub/datahub-gms/scripts/start.sh + - ../gms/war/build/libs/:/datahub/datahub-gms/bin + + datahub-mae-consumer: + image: linkedin/datahub-mae-consumer:debug + build: + context: datahub-mae-consumer + dockerfile: Dockerfile + args: + APP_ENV: dev + volumes: + - ./datahub-mae-consumer/start.sh:/datahub/datahub-mae-consumer/scripts/start.sh + - ../metadata-jobs/mae-consumer-job/build/libs/:/datahub/datahub-mae-consumer/bin/ + + datahub-mce-consumer: + image: linkedin/datahub-mce-consumer:debug + build: + context: datahub-mce-consumer + dockerfile: Dockerfile + args: + APP_ENV: dev + volumes: + - ./datahub-mce-consumer/start.sh:/datahub/datahub-mce-consumer/scripts/start.sh + - ../metadata-jobs/mce-consumer-job/build/libs/:/datahub/datahub-mce-consumer/bin diff --git a/docker/docker-compose.override.yml b/docker/docker-compose.override.yml new file mode 100644 index 0000000000..28cab46816 --- /dev/null +++ b/docker/docker-compose.override.yml @@ -0,0 +1,24 @@ +# Default override to use MySQL as a backing store for datahub-gms (same as docker-compose.mysql.yml). +--- +version: '3.8' +services: + mysql: + container_name: mysql + hostname: mysql + image: mysql:5.7 + env_file: mysql/env/docker.env + restart: always + command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci + ports: + - "3306:3306" + volumes: + - ./mysql/init.sql:/docker-entrypoint-initdb.d/init.sql + - mysqldata:/var/lib/mysql + + datahub-gms: + env_file: datahub-gms/env/docker.env + depends_on: + - mysql + +volumes: + mysqldata: diff --git a/docker/docker-compose.yml b/docker/docker-compose.yml new file mode 100644 index 0000000000..1a152a7efb --- /dev/null +++ b/docker/docker-compose.yml @@ -0,0 +1,192 @@ +# Docker compose file covering DataHub's default configuration, which is to run all containers on a single host. + +# Please see the README.md for instructions as to how to use and customize. + +# NOTE: This file will cannot build! No dockerfiles are set. See the README.md in this directory. +--- +version: '3.8' +services: + zookeeper: + image: confluentinc/cp-zookeeper:5.4.0 + env_file: zookeeper/env/docker.env + hostname: zookeeper + container_name: zookeeper + ports: + - "2181:2181" + volumes: + - zkdata:/var/opt/zookeeper + + broker: + image: confluentinc/cp-kafka:5.4.0 + env_file: broker/env/docker.env + hostname: broker + container_name: broker + depends_on: + - zookeeper + ports: + - "29092:29092" + - "9092:9092" + + kafka-rest-proxy: + image: confluentinc/cp-kafka-rest:5.4.0 + env_file: kafka-rest-proxy/env/docker.env + hostname: kafka-rest-proxy + container_name: kafka-rest-proxy + ports: + - "8082:8082" + depends_on: + - zookeeper + - broker + - schema-registry + + kafka-topics-ui: + image: landoop/kafka-topics-ui:0.9.4 + env_file: kafka-topics-ui/env/docker.env + hostname: kafka-topics-ui + container_name: kafka-topics-ui + ports: + - "18000:8000" + depends_on: + - zookeeper + - broker + - schema-registry + - kafka-rest-proxy + + # This "container" is a workaround to pre-create topics + kafka-setup: + build: + context: kafka-setup + env_file: kafka-setup/env/docker.env + hostname: kafka-setup + container_name: kafka-setup + depends_on: + - broker + - schema-registry + + schema-registry: + image: confluentinc/cp-schema-registry:5.4.0 + env_file: schema-registry/env/docker.env + hostname: schema-registry + container_name: schema-registry + depends_on: + - zookeeper + - broker + ports: + - "8081:8081" + + schema-registry-ui: + image: landoop/schema-registry-ui:latest + env_file: schema-registry-ui/env/docker.env + container_name: schema-registry-ui + hostname: schema-registry-ui + ports: + - "8000:8000" + depends_on: + - schema-registry + + elasticsearch: + image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8 + env_file: elasticsearch/env/docker.env + container_name: elasticsearch + hostname: elasticsearch + ports: + - "9200:9200" + volumes: + - esdata:/usr/share/elasticsearch/data + + kibana: + image: docker.elastic.co/kibana/kibana:5.6.8 + env_file: kibana/env/docker.env + container_name: kibana + hostname: kibana + ports: + - "5601:5601" + depends_on: + - elasticsearch + + neo4j: + image: neo4j:3.5.7 + env_file: neo4j/env/docker.env + hostname: neo4j + container_name: neo4j + ports: + - "7474:7474" + - "7687:7687" + volumes: + - neo4jdata:/data + + # This "container" is a workaround to pre-create search indices + elasticsearch-setup: + build: + context: elasticsearch-setup + env_file: elasticsearch-setup/env/docker.env + hostname: elasticsearch-setup + container_name: elasticsearch-setup + depends_on: + - elasticsearch + + datahub-gms: + build: + context: ../ + dockerfile: docker/datahub-gms/Dockerfile + image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest} + hostname: datahub-gms + container_name: datahub-gms + ports: + - "8080:8080" + depends_on: + - elasticsearch-setup + - kafka-setup + - mysql + - neo4j + + datahub-frontend: + build: + context: ../ + dockerfile: docker/datahub-frontend/Dockerfile + image: linkedin/datahub-frontend:${DATAHUB_VERSION:-latest} + env_file: datahub-frontend/env/docker.env + hostname: datahub-frontend + container_name: datahub-frontend + ports: + - "9001:9001" + depends_on: + - datahub-gms + + datahub-mae-consumer: + build: + context: ../ + dockerfile: docker/datahub-mae-consumer/Dockerfile + image: linkedin/datahub-mae-consumer:${DATAHUB_VERSION:-latest} + env_file: datahub-mae-consumer/env/docker.env + hostname: datahub-mae-consumer + container_name: datahub-mae-consumer + ports: + - "9091:9091" + depends_on: + - kafka-setup + - elasticsearch-setup + - neo4j + + datahub-mce-consumer: + build: + context: ../ + dockerfile: docker/datahub-mce-consumer/Dockerfile + image: linkedin/datahub-mce-consumer:${DATAHUB_VERSION:-latest} + env_file: datahub-mce-consumer/env/docker.env + hostname: datahub-mce-consumer + container_name: datahub-mce-consumer + ports: + - "9090:9090" + depends_on: + - kafka-setup + - datahub-gms + +networks: + default: + name: datahub_network + +volumes: + esdata: + neo4jdata: + zkdata: diff --git a/docker/elasticsearch/Dockerfile b/docker/elasticsearch-setup/Dockerfile similarity index 100% rename from docker/elasticsearch/Dockerfile rename to docker/elasticsearch-setup/Dockerfile diff --git a/docker/elasticsearch-setup/README.md b/docker/elasticsearch-setup/README.md new file mode 100644 index 0000000000..47c4fc5c45 --- /dev/null +++ b/docker/elasticsearch-setup/README.md @@ -0,0 +1,5 @@ +# Elasticsearch & Kibana + +DataHub uses Elasticsearch as a search engine. Elasticsearch powers search, typeahead and browse functions for DataHub. +[Official Elasticsearch Docker image](https://hub.docker.com/_/elasticsearch) found in Docker Hub is used without +any modification. \ No newline at end of file diff --git a/docker/elasticsearch/corpuser-index-config.json b/docker/elasticsearch-setup/corpuser-index-config.json similarity index 100% rename from docker/elasticsearch/corpuser-index-config.json rename to docker/elasticsearch-setup/corpuser-index-config.json diff --git a/docker/elasticsearch/dataprocess-index-config.json b/docker/elasticsearch-setup/dataprocess-index-config.json similarity index 100% rename from docker/elasticsearch/dataprocess-index-config.json rename to docker/elasticsearch-setup/dataprocess-index-config.json diff --git a/docker/elasticsearch/dataset-index-config.json b/docker/elasticsearch-setup/dataset-index-config.json similarity index 100% rename from docker/elasticsearch/dataset-index-config.json rename to docker/elasticsearch-setup/dataset-index-config.json diff --git a/docker/elasticsearch-setup/env/docker.env b/docker/elasticsearch-setup/env/docker.env new file mode 100644 index 0000000000..04fe050539 --- /dev/null +++ b/docker/elasticsearch-setup/env/docker.env @@ -0,0 +1,2 @@ +ELASTICSEARCH_HOST=elasticsearch +ELASTICSEARCH_PORT=9200 diff --git a/docker/elasticsearch/README.md b/docker/elasticsearch/README.md deleted file mode 100644 index 963e88bbd3..0000000000 --- a/docker/elasticsearch/README.md +++ /dev/null @@ -1,35 +0,0 @@ -# Elasticsearch & Kibana - -DataHub uses Elasticsearch as a search engine. Elasticsearch powers search, typeahead and browse functions for DataHub. -[Official Elasticsearch Docker image](https://hub.docker.com/_/elasticsearch) found in Docker Hub is used without -any modification. - -## Run Docker container -Below command will start the Elasticsearch and Kibana containers. `DataHub` uses Elasticsearch release `5.6.8`. Newer -versions of Elasticsearch are not tested and you might experience compatibility issues. -``` -cd docker/elasticsearch && docker-compose pull && docker-compose up --build -``` -You can connect to Kibana on your web browser to monitor Elasticsearch via below link: -``` -http://localhost:5601 -``` - -## Container configuration -### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - "9200:9200" -``` - -### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` \ No newline at end of file diff --git a/docker/elasticsearch/docker-compose.yml b/docker/elasticsearch/docker-compose.yml deleted file mode 100644 index 317bd44766..0000000000 --- a/docker/elasticsearch/docker-compose.yml +++ /dev/null @@ -1,38 +0,0 @@ ---- -version: '3.5' -services: - elasticsearch: - image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8 - container_name: elasticsearch - hostname: elasticsearch - ports: - - "9200:9200" - environment: - - discovery.type=single-node - - xpack.security.enabled=false - - "ES_JAVA_OPTS=-Xms1g -Xmx1g" - - kibana: - image: docker.elastic.co/kibana/kibana:5.6.8 - container_name: kibana - hostname: kibana - ports: - - "5601:5601" - depends_on: - - elasticsearch - - # This "container" is a workaround to pre-create search indices - elasticsearch-setup: - build: - context: . - hostname: elasticsearch-setup - container_name: elasticsearch-setup - depends_on: - - elasticsearch - environment: - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/elasticsearch/env/docker.env b/docker/elasticsearch/env/docker.env new file mode 100644 index 0000000000..75b52bd983 --- /dev/null +++ b/docker/elasticsearch/env/docker.env @@ -0,0 +1,3 @@ +discovery.type=single-node +xpack.security.enabled=false +ES_JAVA_OPTS=-Xms1g -Xmx1g diff --git a/docker/frontend/README.md b/docker/frontend/README.md deleted file mode 100644 index 0cfcc99275..0000000000 --- a/docker/frontend/README.md +++ /dev/null @@ -1,50 +0,0 @@ -# DataHub Frontend Docker Image -[![datahub-frontend docker](https://github.com/linkedin/datahub/workflows/datahub-frontend%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-frontend+docker%22) - -Refer to [DataHub Frontend Service](../../datahub-frontend) to have a quick understanding of the architecture and -responsibility of this service for the DataHub. - -## Build & Run -``` -cd docker/frontend && docker-compose up --build -``` -This command will rebuild the docker image and start a container based on the image. - -To start a container using an existing image, run the same command without the `--build` flag. - -### Container configuration -#### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - "9001:9001" -``` - -#### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` - -#### datahub-gms Container -Before starting `datahub-frontend` container, `datahub-gms` container should already be up and running. -`datahub-frontend` service creates a connection to `datahub-gms` service and this is configured with environment -variables in `docker-compose.yml`: -``` -environment: - - DATAHUB_GMS_HOST=datahub-gms - - DATAHUB_GMS_PORT=8080 -``` -The value of `DATAHUB_GMS_HOST` variable should be set to the host name of the `datahub-gms` container within the Docker network. - -## Checking out DataHub UI -After starting your Docker container, you can connect to it by typing below into your favorite web browser: -``` -http://localhost:9001 -``` -You can sign in with `datahub` as username and password. diff --git a/docker/frontend/docker-compose.yml b/docker/frontend/docker-compose.yml deleted file mode 100644 index 6ccae92b8a..0000000000 --- a/docker/frontend/docker-compose.yml +++ /dev/null @@ -1,22 +0,0 @@ ---- -version: '3.5' -services: - datahub-frontend: - image: linkedin/datahub-frontend:${DATAHUB_VERSION:-latest} - build: - context: ../../ - dockerfile: docker/frontend/Dockerfile - hostname: datahub-frontend - container_name: datahub-frontend - ports: - - "9001:9001" - environment: - - DATAHUB_GMS_HOST=datahub-gms - - DATAHUB_GMS_PORT=8080 - - DATAHUB_SECRET=YouKnowNothing - - DATAHUB_APP_VERSION=1.0 - - DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/gms/Dockerfile b/docker/gms/Dockerfile deleted file mode 100644 index e795445d77..0000000000 --- a/docker/gms/Dockerfile +++ /dev/null @@ -1,19 +0,0 @@ -FROM openjdk:8 as builder -COPY . /datahub-src -RUN cd /datahub-src && ./gradlew :gms:war:build \ - && cp gms/war/build/libs/war.war /gms.war - - -FROM openjdk:8-jre-alpine -ENV DOCKERIZE_VERSION v0.6.1 -RUN apk --no-cache add curl tar \ - && curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.20.v20190813/jetty-runner-9.4.20.v20190813.jar --output jetty-runner.jar \ - && curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv - -COPY --from=builder /gms.war . -COPY docker/gms/start.sh /start.sh -RUN chmod +x /start.sh - -EXPOSE 8080 - -CMD /start.sh \ No newline at end of file diff --git a/docker/gms/README.md b/docker/gms/README.md deleted file mode 100644 index 2620ab6a2e..0000000000 --- a/docker/gms/README.md +++ /dev/null @@ -1,82 +0,0 @@ -# DataHub Generalized Metadata Store (GMS) Docker Image -[![datahub-gms docker](https://github.com/linkedin/datahub/workflows/datahub-gms%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-gms+docker%22) - -Refer to [DataHub GMS Service](../../gms) to have a quick understanding of the architecture and -responsibility of this service for the DataHub. - - -## Build & Run -``` -cd docker/gms && docker-compose up --build -``` -This command will rebuild the local docker image and start a container based on the image. - -To start a container using an existing image, run the same command without the `--build` flag. - -### Container configuration -#### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - "8080:8080" -``` - -#### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` - -#### MySQL, Elasticsearch and Kafka Containers -Before starting `datahub-gms` container, `mysql`, `elasticsearch`, `neo4j` and `kafka` containers should already be up and running. -These connections are configured via environment variables in `docker-compose.yml`: -``` -environment: - - EBEAN_DATASOURCE_USERNAME=datahub - - EBEAN_DATASOURCE_PASSWORD=datahub - - EBEAN_DATASOURCE_HOST=mysql:3306 - - EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub - - EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver -``` -The value of `EBEAN_DATASOURCE_HOST` variable should be set to the host name of the `mysql` container within the Docker network. - -``` -environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 -``` -The value of `KAFKA_BOOTSTRAP_SERVER` variable should be set to the host name of the `kafka broker` container within the Docker network. -The value of `KAFKA_SCHEMAREGISTRY_URL` variable should be set to the host name of the `kafka schema registry` container within the Docker network. - -``` -environment: - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 -``` -The value of `ELASTICSEARCH_HOST` variable should be set to the host name of the `elasticsearch` container within the Docker network. - -``` -environment: - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub -``` -The value of `NEO4J_URI` variable should be set to the host name of the `neo4j` container within the Docker network. - -## Other Database Platforms -While GMS defaults to using MySQL as its storage backend, it is possible to switch to any of the -[database platforms](https://ebean.io/docs/database/) supported by Ebean. -For example, you can run the following command to start a GMS that connects to a PostgreSQL backend -``` -cd docker/gms && docker-compose -f docker-compose-postgres.yml up --build -``` -or a MariaDB backend -``` -cd docker/gms && docker-compose -f docker-compose-mariadb.yml up --build -``` diff --git a/docker/gms/docker-compose-mariadb.yml b/docker/gms/docker-compose-mariadb.yml deleted file mode 100644 index 435d17ea95..0000000000 --- a/docker/gms/docker-compose-mariadb.yml +++ /dev/null @@ -1,30 +0,0 @@ ---- -version: '3.5' -services: - datahub-gms: - image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest} - build: - context: ../../ - dockerfile: docker/gms/Dockerfile - hostname: datahub-gms - container_name: datahub-gms - ports: - - "8080:8080" - environment: - - EBEAN_DATASOURCE_USERNAME=datahub - - EBEAN_DATASOURCE_PASSWORD=datahub - - EBEAN_DATASOURCE_HOST=mariadb:3306 - - EBEAN_DATASOURCE_URL=jdbc:mariadb://mariadb:3306/datahub - - EBEAN_DATASOURCE_DRIVER=org.mariadb.jdbc.Driver - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/gms/docker-compose-postgres.yml b/docker/gms/docker-compose-postgres.yml deleted file mode 100644 index dcd1608acc..0000000000 --- a/docker/gms/docker-compose-postgres.yml +++ /dev/null @@ -1,30 +0,0 @@ ---- -version: '3.5' -services: - datahub-gms: - image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest} - build: - context: ../../ - dockerfile: docker/gms/Dockerfile - hostname: datahub-gms - container_name: datahub-gms - ports: - - "8080:8080" - environment: - - EBEAN_DATASOURCE_USERNAME=datahub - - EBEAN_DATASOURCE_PASSWORD=datahub - - EBEAN_DATASOURCE_HOST=postgres:5432 - - EBEAN_DATASOURCE_URL=jdbc:postgresql://postgres:5432/datahub - - EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/gms/docker-compose.yml b/docker/gms/docker-compose.yml deleted file mode 100644 index 2600dba1e3..0000000000 --- a/docker/gms/docker-compose.yml +++ /dev/null @@ -1,30 +0,0 @@ ---- -version: '3.5' -services: - datahub-gms: - image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest} - build: - context: ../../ - dockerfile: docker/gms/Dockerfile - hostname: datahub-gms - container_name: datahub-gms - ports: - - "8080:8080" - environment: - - EBEAN_DATASOURCE_USERNAME=datahub - - EBEAN_DATASOURCE_PASSWORD=datahub - - EBEAN_DATASOURCE_HOST=mysql:3306 - - EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true - - EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/ingestion/README.md b/docker/ingestion/README.md index 5eaf1e9e85..7db93d41bb 100644 --- a/docker/ingestion/README.md +++ b/docker/ingestion/README.md @@ -2,16 +2,3 @@ Refer to [DataHub Metadata Ingestion](../../metadata-ingestion/mce-cli) to have a quick understanding of the architecture and responsibility of this service for the DataHub. - -## Build & Run -``` -cd docker/ingestion && docker-compose up --build -``` -This command will rebuild the docker image and start a container based on the image. - -To start a container using an existing image, run the same command without the `--build` flag. - -### Container configuration - -#### Prerequisite Containers -Before starting `ingestion` container, `kafka`, `datahub-gms`, `mysql` and `datahub-mce-consumer` containers should already be up and running. \ No newline at end of file diff --git a/docker/kafka-rest-proxy/env/docker.env b/docker/kafka-rest-proxy/env/docker.env new file mode 100644 index 0000000000..1e96b3dde1 --- /dev/null +++ b/docker/kafka-rest-proxy/env/docker.env @@ -0,0 +1,4 @@ +KAFKA_REST_LISTENERS=http://0.0.0.0:8082/ +KAFKA_REST_SCHEMA_REGISTRY_URL=http://schema-registry:8081/ +KAFKA_REST_HOST_NAME=kafka-rest-proxy +KAFKA_REST_BOOTSTRAP_SERVERS=PLAINTEXT://broker:29092 diff --git a/docker/kafka/Dockerfile b/docker/kafka-setup/Dockerfile similarity index 100% rename from docker/kafka/Dockerfile rename to docker/kafka-setup/Dockerfile diff --git a/docker/kafka-setup/README.md b/docker/kafka-setup/README.md new file mode 100644 index 0000000000..485218abb5 --- /dev/null +++ b/docker/kafka-setup/README.md @@ -0,0 +1,14 @@ +# Kafka, Zookeeper and Schema Registry + +DataHub uses Kafka as the pub-sub message queue in the backend. +[Official Confluent Kafka Docker images](https://hub.docker.com/u/confluentinc) found in Docker Hub is used without +any modification. + +## Debugging Kafka +You can install [kafkacat](https://github.com/edenhill/kafkacat) to consume and produce messaged to Kafka topics. +For example, to consume messages on MetadataAuditEvent topic, you can run below command. +``` +kafkacat -b localhost:9092 -t MetadataAuditEvent +``` +However, `kafkacat` currently doesn't support Avro deserialization at this point, +but they have an ongoing [work](https://github.com/edenhill/kafkacat/pull/151) for that. \ No newline at end of file diff --git a/docker/kafka-setup/env/docker.env b/docker/kafka-setup/env/docker.env new file mode 100644 index 0000000000..91f64e1cac --- /dev/null +++ b/docker/kafka-setup/env/docker.env @@ -0,0 +1,2 @@ +KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 +KAFKA_BOOTSTRAP_SERVER=broker:29092 diff --git a/docker/kafka-topics-ui/env/docker.env b/docker/kafka-topics-ui/env/docker.env new file mode 100644 index 0000000000..bc4b8ea797 --- /dev/null +++ b/docker/kafka-topics-ui/env/docker.env @@ -0,0 +1,2 @@ +KAFKA_REST_PROXY_URL="http://kafkarestproxy:8082/" +PROXY="true" diff --git a/docker/kafka/README.md b/docker/kafka/README.md deleted file mode 100644 index dc1556c867..0000000000 --- a/docker/kafka/README.md +++ /dev/null @@ -1,47 +0,0 @@ -# Kafka, Zookeeper and Schema Registry - -DataHub uses Kafka as the pub-sub message queue in the backend. -[Official Confluent Kafka Docker images](https://hub.docker.com/u/confluentinc) found in Docker Hub is used without -any modification. - -## Run Docker container -Below command will start all Kafka related containers. -``` -cd docker/kafka && docker-compose pull && docker-compose up -``` -As part of `docker-compose`, we also initialize a container called `kafka-setup` to create `MetadataAuditEvent` and -`MetadataChangeEvent` & `FailedMetadataChangeEvent` topics. The only thing this container does is creating Kafka topics after Kafka broker is ready. - -There is also a container which provides visual schema registry interface which you can register/unregister schemas. -You can connect to `schema-registry-ui` on your web browser to monitor Kafka Schema Registry via below link: -``` -http://localhost:8000 -``` - -## Container configuration -### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - "9092:9092" -``` - -### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` - -## Debugging Kafka -You can install [kafkacat](https://github.com/edenhill/kafkacat) to consume and produce messaged to Kafka topics. -For example, to consume messages on MetadataAuditEvent topic, you can run below command. -``` -kafkacat -b localhost:9092 -t MetadataAuditEvent -``` -However, `kafkacat` currently doesn't support Avro deserialization at this point, -but they have an ongoing [work](https://github.com/edenhill/kafkacat/pull/151) for that. \ No newline at end of file diff --git a/docker/kafka/docker-compose.yml b/docker/kafka/docker-compose.yml deleted file mode 100644 index b8db356ef8..0000000000 --- a/docker/kafka/docker-compose.yml +++ /dev/null @@ -1,104 +0,0 @@ ---- -version: '3.5' -services: - zookeeper: - image: confluentinc/cp-zookeeper:5.4.0 - hostname: zookeeper - container_name: zookeeper - ports: - - "2181:2181" - environment: - ZOOKEEPER_CLIENT_PORT: 2181 - ZOOKEEPER_TICK_TIME: 2000 - - broker: - image: confluentinc/cp-kafka:5.4.0 - hostname: broker - container_name: broker - depends_on: - - zookeeper - ports: - - "29092:29092" - - "9092:9092" - environment: - KAFKA_BROKER_ID: 1 - KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181' - KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT - KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092 - KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 - KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 - - # This "container" is a workaround to pre-create topics - kafka-setup: - build: - context: . - hostname: kafka-setup - container_name: kafka-setup - depends_on: - - broker - - schema-registry - environment: - - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - kafka-rest-proxy: - image: confluentinc/cp-kafka-rest:5.4.0 - hostname: kafka-rest-proxy - ports: - - "8082:8082" - environment: - KAFKA_REST_LISTENERS: http://0.0.0.0:8082/ - KAFKA_REST_SCHEMA_REGISTRY_URL: http://schema-registry:8081/ - KAFKA_REST_HOST_NAME: kafka-rest-proxy - KAFKA_REST_BOOTSTRAP_SERVERS: PLAINTEXT://broker:29092 - depends_on: - - zookeeper - - broker - - schema-registry - - kafka-topics-ui: - image: landoop/kafka-topics-ui:0.9.4 - hostname: kafka-topics-ui - ports: - - "18000:8000" - environment: - KAFKA_REST_PROXY_URL: "http://kafka-rest-proxy:8082/" - PROXY: "true" - depends_on: - - zookeeper - - broker - - schema-registry - - kafka-rest-proxy - - schema-registry: - image: confluentinc/cp-schema-registry:5.4.0 - hostname: schema-registry - container_name: schema-registry - depends_on: - - zookeeper - - broker - ports: - - "8081:8081" - environment: - SCHEMA_REGISTRY_HOST_NAME: schema-registry - SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181' - - schema-registry-ui: - image: landoop/schema-registry-ui:latest - container_name: schema-registry-ui - hostname: schema-registry-ui - ports: - - "8000:8000" - environment: - SCHEMAREGISTRY_URL: 'http://schema-registry:8081' - ALLOW_GLOBAL: 'true' - ALLOW_TRANSITIVE: 'true' - ALLOW_DELETION: 'true' - READONLY_MODE: 'true' - PROXY: 'true' - depends_on: - - schema-registry - -networks: - default: - name: datahub_network diff --git a/docker/kibana/env/docker.env b/docker/kibana/env/docker.env new file mode 100644 index 0000000000..bbafd8a3e1 --- /dev/null +++ b/docker/kibana/env/docker.env @@ -0,0 +1,2 @@ +SERVER_HOST=0.0.0.0 +ELASTICSEARCH_URL=http://elasticsearch:9200 diff --git a/docker/mae-consumer/Dockerfile b/docker/mae-consumer/Dockerfile deleted file mode 100644 index 63541e2a35..0000000000 --- a/docker/mae-consumer/Dockerfile +++ /dev/null @@ -1,19 +0,0 @@ -FROM openjdk:8 as builder - -COPY . datahub-src -RUN cd datahub-src && ./gradlew :metadata-jobs:mae-consumer-job:build \ - && cp metadata-jobs/mae-consumer-job/build/libs/mae-consumer-job.jar ../mae-consumer-job.jar \ - && cd .. && rm -rf datahub-src - -FROM openjdk:8-jre-alpine -ENV DOCKERIZE_VERSION v0.6.1 -RUN apk --no-cache add curl tar \ - && curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv - -COPY --from=builder /mae-consumer-job.jar /mae-consumer-job.jar -COPY docker/mae-consumer/start.sh /start.sh -RUN chmod +x /start.sh - -EXPOSE 9091 - -CMD /start.sh \ No newline at end of file diff --git a/docker/mae-consumer/README.md b/docker/mae-consumer/README.md deleted file mode 100644 index c37978290d..0000000000 --- a/docker/mae-consumer/README.md +++ /dev/null @@ -1,42 +0,0 @@ -# DataHub MetadataAuditEvent (MAE) Consumer Docker Image -[![datahub-mae-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mae-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mae-consumer+docker%22) - -Refer to [DataHub MAE Consumer Job](../../metadata-jobs/mae-consumer-job) to have a quick understanding of the architecture and -responsibility of this service for the DataHub. - -## Build & Run -``` -cd docker/mae-consumer && docker-compose up --build -``` -This command will rebuild the docker image and start a container based on the image. - -To start a container using a previously built image, run the same command without the `--build` flag. - -### Container configuration - -#### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` - -#### Elasticsearch and Kafka Containers -Before starting `datahub-mae-consumer` container, `elasticsearch` and `kafka` containers should already be up and running. -These connections are configured via environment variables in `docker-compose.yml`: -``` -environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 -``` -The value of `KAFKA_BOOTSTRAP_SERVER` variable should be set to the host name of the `kafka broker` container within the Docker network. -The value of `KAFKA_SCHEMAREGISTRY_URL` variable should be set to the host name of the `kafka schema registry` container within the Docker network. - -``` -environment: - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 -``` -The value of `ELASTICSEARCH_HOST` variable should be set to the host name of the `elasticsearch` container within the Docker network. \ No newline at end of file diff --git a/docker/mae-consumer/docker-compose.yml b/docker/mae-consumer/docker-compose.yml deleted file mode 100644 index aa5f7342a1..0000000000 --- a/docker/mae-consumer/docker-compose.yml +++ /dev/null @@ -1,25 +0,0 @@ ---- -version: '3.5' -services: - datahub-mae-consumer: - image: linkedin/datahub-mae-consumer:${DATAHUB_VERSION:-latest} - build: - context: ../../ - dockerfile: docker/mae-consumer/Dockerfile - hostname: datahub-mae-consumer - container_name: datahub-mae-consumer - ports: - - "9091:9091" - environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - -networks: - default: - name: datahub_network diff --git a/docker/mariadb/README.md b/docker/mariadb/README.md index 10ae6bcab7..efdc10a18d 100644 --- a/docker/mariadb/README.md +++ b/docker/mariadb/README.md @@ -4,36 +4,3 @@ DataHub GMS can use MariaDB as an alternate storage backend. [Official MariaDB Docker image](https://hub.docker.com/_/mariadb) found in Docker Hub is used without any modification. - -## Run Docker container -Below command will start the MariaDB container. -``` -cd docker/mariadb && docker-compose pull && docker-compose up -``` - -An initialization script [init.sql](init.sql) is provided to container. This script initializes `metadata-aspect` table -which is basically the Key-Value store of the DataHub GMS. - -To connect to MariaDB container, you can type below command: -``` -docker exec -it mariadb mysql -u datahub -pdatahub datahub -``` - -## Container configuration -### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - '3306:3306' -``` - -### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` \ No newline at end of file diff --git a/docker/mariadb/docker-compose.yml b/docker/mariadb/docker-compose.mariadb.yml similarity index 54% rename from docker/mariadb/docker-compose.yml rename to docker/mariadb/docker-compose.mariadb.yml index 179b565cc4..6c14759107 100644 --- a/docker/mariadb/docker-compose.yml +++ b/docker/mariadb/docker-compose.mariadb.yml @@ -1,21 +1,23 @@ +# Override to use MariaDB as a backing store for datahub-gms. --- -version: '3.5' +version: '3.8' services: - mysql: + mariadb: container_name: mariadb hostname: mariadb image: mariadb:10.5 + env_file: env/docker.env restart: always - environment: - MYSQL_DATABASE: 'datahub' - MYSQL_USER: 'datahub' - MYSQL_PASSWORD: 'datahub' - MYSQL_ROOT_PASSWORD: 'datahub' ports: - '3306:3306' volumes: - ./init.sql:/docker-entrypoint-initdb.d/init.sql + datahub-gms: + env_file: ../datahub-gms/env/dev.mariadb.env + depends_on: + - mariadb + networks: default: name: datahub_network \ No newline at end of file diff --git a/docker/mce-consumer/Dockerfile b/docker/mce-consumer/Dockerfile deleted file mode 100644 index 6a9de0b663..0000000000 --- a/docker/mce-consumer/Dockerfile +++ /dev/null @@ -1,19 +0,0 @@ -FROM openjdk:8 as builder - -COPY . datahub-src -RUN cd datahub-src && ./gradlew :metadata-jobs:mce-consumer-job:build \ - && cp metadata-jobs/mce-consumer-job/build/libs/mce-consumer-job.jar ../mce-consumer-job.jar \ - && cd .. && rm -rf datahub-src - -FROM openjdk:8-jre-alpine -ENV DOCKERIZE_VERSION v0.6.1 -RUN apk --no-cache add curl tar \ - && curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv - -COPY --from=builder /mce-consumer-job.jar /mce-consumer-job.jar -COPY docker/mce-consumer/start.sh /start.sh -RUN chmod +x /start.sh - -EXPOSE 9090 - -CMD /start.sh \ No newline at end of file diff --git a/docker/mce-consumer/README.md b/docker/mce-consumer/README.md deleted file mode 100644 index 7eebc7280a..0000000000 --- a/docker/mce-consumer/README.md +++ /dev/null @@ -1,42 +0,0 @@ -# DataHub MetadataChangeEvent (MCE) Consumer Docker Image -[![datahub-mce-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mce-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mce-consumer+docker%22) - -Refer to [DataHub MCE Consumer Job](../../metadata-jobs/mce-consumer-job) to have a quick understanding of the architecture and -responsibility of this service for the DataHub. - -## Build & Run -``` -cd docker/mce-consumer && docker-compose up --build -``` -This command will rebuild the docker image and start a container based on the image. - -To start a container using a previously built image, run the same command without the `--build` flag. - -### Container configuration - -#### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` - -#### Kafka and DataHub GMS Containers -Before starting `datahub-mce-consumer` container, `datahub-gms` and `kafka` containers should already be up and running. -These connections are configured via environment variables in `docker-compose.yml`: -``` -environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 -``` -The value of `KAFKA_BOOTSTRAP_SERVER` variable should be set to the host name of the `kafka broker` container within the Docker network. -The value of `KAFKA_SCHEMAREGISTRY_URL` variable should be set to the host name of the `kafka schema registry` container within the Docker network. - -``` -environment: - - GMS_HOST=datahub-gms - - GMS_PORT=8080 -``` -The value of `GMS_HOST` variable should be set to the host name of the `datahub-gms` container within the Docker network. \ No newline at end of file diff --git a/docker/mce-consumer/docker-compose.yml b/docker/mce-consumer/docker-compose.yml deleted file mode 100644 index 07c83d6ccb..0000000000 --- a/docker/mce-consumer/docker-compose.yml +++ /dev/null @@ -1,23 +0,0 @@ ---- -version: '3.5' -services: - datahub-mce-consumer: - image: linkedin/datahub-mce-consumer:${DATAHUB_VERSION:-latest} - build: - context: ../../ - dockerfile: docker/mce-consumer/Dockerfile - hostname: datahub-mce-consumer - container_name: datahub-mce-consumer - ports: - - "9090:9090" - environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - GMS_HOST=datahub-gms - - GMS_PORT=8080 - - KAFKA_MCE_TOPIC_NAME=MetadataChangeEvent - - KAFKA_FMCE_TOPIC_NAME=FailedMetadataChangeEvent - -networks: - default: - name: datahub_network diff --git a/docker/mysql/README.md b/docker/mysql/README.md index 69eef38c0a..9b6d7088ff 100644 --- a/docker/mysql/README.md +++ b/docker/mysql/README.md @@ -4,36 +4,3 @@ DataHub GMS uses MySQL as the storage backend. [Official MySQL Docker image](https://hub.docker.com/_/mysql) found in Docker Hub is used without any modification. - -## Run Docker container -Below command will start the MySQL container. -``` -cd docker/mysql && docker-compose pull && docker-compose up -``` - -An initialization script [init.sql](init.sql) is provided to container. This script initializes `metadata-aspect` table -which is basically the Key-Value store of the DataHub GMS. - -To connect to MySQL container, you can type below command: -``` -docker exec -it mysql mysql -u datahub -pdatahub datahub -``` - -## Container configuration -### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - '3306:3306' -``` - -### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` \ No newline at end of file diff --git a/docker/mysql/docker-compose.mysql.yml b/docker/mysql/docker-compose.mysql.yml new file mode 100644 index 0000000000..68c72d50e3 --- /dev/null +++ b/docker/mysql/docker-compose.mysql.yml @@ -0,0 +1,24 @@ +# Override to use MySQL as a backing store for datahub-gms. +--- +version: '3.8' +services: + mysql: + container_name: mysql + hostname: mysql + image: mysql:5.7 + env_file: env/docker.env + restart: always + command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci + ports: + - "3306:3306" + volumes: + - ./init.sql:/docker-entrypoint-initdb.d/init.sql + - mysqldata:/var/lib/mysql + + datahub-gms: + env_file: ../datahub-gms/env/docker.env + depends_on: + - mysql + +volumes: + mysqldata: diff --git a/docker/mysql/docker-compose.yml b/docker/mysql/docker-compose.yml deleted file mode 100644 index 9b64f84041..0000000000 --- a/docker/mysql/docker-compose.yml +++ /dev/null @@ -1,21 +0,0 @@ ---- -version: '3.5' -services: - mysql: - container_name: mysql - hostname: mysql - image: mysql:5.7 - restart: always - environment: - MYSQL_DATABASE: 'datahub' - MYSQL_USER: 'datahub' - MYSQL_PASSWORD: 'datahub' - MYSQL_ROOT_PASSWORD: 'datahub' - ports: - - '3306:3306' - volumes: - - ./init.sql:/docker-entrypoint-initdb.d/init.sql - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/mysql/env/docker.env b/docker/mysql/env/docker.env new file mode 100644 index 0000000000..72e3e2155d --- /dev/null +++ b/docker/mysql/env/docker.env @@ -0,0 +1,4 @@ +MYSQL_DATABASE=datahub +MYSQL_USER=datahub +MYSQL_PASSWORD=datahub +MYSQL_ROOT_PASSWORD=datahub diff --git a/docker/neo4j/README.md b/docker/neo4j/README.md index 560f3e522b..b0b9f486d9 100644 --- a/docker/neo4j/README.md +++ b/docker/neo4j/README.md @@ -4,32 +4,6 @@ DataHub uses Neo4j as graph db in the backend to serve graph queries. [Official Neo4j image](https://hub.docker.com/_/neo4j) found in Docker Hub is used without any modification. -## Run Docker container -Below command will start all Neo4j container. -``` -cd docker/neo4j && docker-compose pull && docker-compose up -``` - -## Container configuration -### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - "7474:7474" - - "7687:7687" -``` - -### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change it for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` - ## Neo4j Browser To be able to debug and run Cypher queries against your Neo4j image, you can open up `Neo4j Browser` which is running at [http://localhost:7474/browser/](http://localhost:7474/browser/). Default username is `neo4j` and password is `datahub`. \ No newline at end of file diff --git a/docker/neo4j/docker-compose.yml b/docker/neo4j/docker-compose.yml deleted file mode 100644 index 23c4810749..0000000000 --- a/docker/neo4j/docker-compose.yml +++ /dev/null @@ -1,16 +0,0 @@ ---- -version: '3.5' -services: - neo4j: - image: neo4j:3.5.7 - hostname: neo4j - container_name: neo4j - environment: - NEO4J_AUTH: 'neo4j/datahub' - ports: - - "7474:7474" - - "7687:7687" - -networks: - default: - name: datahub_network \ No newline at end of file diff --git a/docker/neo4j/env/docker.env b/docker/neo4j/env/docker.env new file mode 100644 index 0000000000..375035f620 --- /dev/null +++ b/docker/neo4j/env/docker.env @@ -0,0 +1 @@ +NEO4J_AUTH=neo4j/datahub diff --git a/docker/postgres/README.md b/docker/postgres/README.md new file mode 100644 index 0000000000..3cb187af1e --- /dev/null +++ b/docker/postgres/README.md @@ -0,0 +1,6 @@ +# MySQL + +DataHub GMS can use PostgreSQL as an alternate storage backend. + +[Official PostgreSQL Docker image](https://hub.docker.com/_/postgres) found in Docker Hub is used without +any modification. diff --git a/docker/postgresql/docker-compose.yml b/docker/postgres/docker-compose.postgre.yml similarity index 56% rename from docker/postgresql/docker-compose.yml rename to docker/postgres/docker-compose.postgre.yml index db4a814942..b980139e20 100644 --- a/docker/postgresql/docker-compose.yml +++ b/docker/postgres/docker-compose.postgre.yml @@ -1,19 +1,23 @@ +# Override to use PostgreSQL as a backing store for datahub-gms. --- -version: '3.5' +version: '3.8' services: postgres: container_name: postgres hostname: postgres image: postgres:12.3 + env_file: env/docker.env restart: always - environment: - POSTGRES_USER: datahub - POSTGRES_PASSWORD: datahub ports: - '5432:5432' volumes: - ./init.sql:/docker-entrypoint-initdb.d/init.sql + datahub-gms: + env_file: ../datahub-gms/env/dev.postgres.env + depends_on: + - postgres + networks: default: name: datahub_network \ No newline at end of file diff --git a/docker/postgres/env/docker.env b/docker/postgres/env/docker.env new file mode 100644 index 0000000000..f84a2b5635 --- /dev/null +++ b/docker/postgres/env/docker.env @@ -0,0 +1,2 @@ +POSTGRES_USER: datahub +POSTGRES_PASSWORD: datahub diff --git a/docker/postgresql/init.sql b/docker/postgres/init.sql similarity index 100% rename from docker/postgresql/init.sql rename to docker/postgres/init.sql diff --git a/docker/postgresql/README.md b/docker/postgresql/README.md deleted file mode 100644 index c0a5056914..0000000000 --- a/docker/postgresql/README.md +++ /dev/null @@ -1,39 +0,0 @@ -# MySQL - -DataHub GMS can use PostgreSQL as an alternate storage backend. - -[Official PostgreSQL Docker image](https://hub.docker.com/_/postgres) found in Docker Hub is used without -any modification. - -## Run Docker container -Below command will start the MySQL container. -``` -cd docker/postgres && docker-compose pull && docker-compose up -``` - -An initialization script [init.sql](init.sql) is provided to container. This script initializes `metadata-aspect` table -which is basically the Key-Value store of the DataHub GMS. - -To connect to PostgreSQL container, you can type below command: -``` -docker exec -it postgres psql -U datahub -``` - -## Container configuration -### External Port -If you need to configure default configurations for your container such as the exposed port, you will do that in -`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand -how to change your exposed port settings. -``` -ports: - - '5432:5432' -``` - -### Docker Network -All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`. -If you change this, you will need to change this for all other Docker containers as well. -``` -networks: - default: - name: datahub_network -``` \ No newline at end of file diff --git a/docker/quickstart.sh b/docker/quickstart.sh new file mode 100755 index 0000000000..7eb3cac649 --- /dev/null +++ b/docker/quickstart.sh @@ -0,0 +1,7 @@ +#!/bin/bash + +# Quickstarts DataHub by pullinng all images from dockerhub and then running the containers locally. No images are +# built locally. Note: by default this pulls the latest version; you can change this to a specific version by setting +# the DATAHUB_VERSION environment variable. +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" +cd $DIR && docker-compose pull && docker-compose -p datahub up \ No newline at end of file diff --git a/docker/quickstart/README.md b/docker/quickstart/README.md deleted file mode 100644 index 8dabaf0035..0000000000 --- a/docker/quickstart/README.md +++ /dev/null @@ -1,30 +0,0 @@ -# DataHub Quickstart -To start all Docker containers at once, please run below command from project root directory: -```bash -./docker/quickstart/quickstart.sh -``` - -At this point, all containers are ready and DataHub can be considered up and running. Check specific containers guide -for details: -* [Elasticsearch & Kibana](../elasticsearch) -* [DataHub Frontend](../frontend) -* [DataHub GMS](../gms) -* [Kafka, Schema Registry & Zookeeper](../kafka) -* [DataHub MAE Consumer](../mae-consumer) -* [DataHub MCE Consumer](../mce-consumer) -* [MySQL](../mysql) - -From this point on, if you want to be able to sign in to DataHub and see some sample data, please see -[Metadata Ingestion Guide](../../metadata-ingestion) for `bootstrapping DataHub`. - -You can also choose to use a specific versin of DataHub docker images instead of the `latest` by specifying `DATAHUB_VERSION` environment variable. - -## Debugging Containers -If you want to debug containers, you can check container logs: -``` -docker logs <> -``` -Also, you can connect to container shell for further debugging: -``` -docker exec -it <> bash -``` diff --git a/docker/quickstart/docker-compose.yml b/docker/quickstart/docker-compose.yml deleted file mode 100644 index 13f75555a2..0000000000 --- a/docker/quickstart/docker-compose.yml +++ /dev/null @@ -1,260 +0,0 @@ ---- -version: '3.5' -services: - mysql: - container_name: mysql - hostname: mysql - image: mysql:5.7 - restart: always - command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci - environment: - MYSQL_DATABASE: 'datahub' - MYSQL_USER: 'datahub' - MYSQL_PASSWORD: 'datahub' - MYSQL_ROOT_PASSWORD: 'datahub' - ports: - - "3306:3306" - volumes: - - ../mysql/init.sql:/docker-entrypoint-initdb.d/init.sql - - mysqldata:/var/lib/mysql - - zookeeper: - image: confluentinc/cp-zookeeper:5.4.0 - hostname: zookeeper - container_name: zookeeper - ports: - - "2181:2181" - environment: - ZOOKEEPER_CLIENT_PORT: 2181 - ZOOKEEPER_TICK_TIME: 2000 - volumes: - - zkdata:/var/opt/zookeeper - - broker: - image: confluentinc/cp-kafka:5.4.0 - hostname: broker - container_name: broker - depends_on: - - zookeeper - ports: - - "29092:29092" - - "9092:9092" - environment: - KAFKA_BROKER_ID: 1 - KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181' - KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT - KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092 - KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 - KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 - - kafka-rest-proxy: - image: confluentinc/cp-kafka-rest:5.4.0 - hostname: kafka-rest-proxy - container_name: kafka-rest-proxy - ports: - - "8082:8082" - environment: - KAFKA_REST_LISTENERS: http://0.0.0.0:8082/ - KAFKA_REST_SCHEMA_REGISTRY_URL: http://schema-registry:8081/ - KAFKA_REST_HOST_NAME: kafka-rest-proxy - KAFKA_REST_BOOTSTRAP_SERVERS: PLAINTEXT://broker:29092 - depends_on: - - zookeeper - - broker - - schema-registry - - kafka-topics-ui: - image: landoop/kafka-topics-ui:0.9.4 - hostname: kafka-topics-ui - container_name: kafka-topics-ui - ports: - - "18000:8000" - environment: - KAFKA_REST_PROXY_URL: "http://kafka-rest-proxy:8082/" - PROXY: "true" - depends_on: - - zookeeper - - broker - - schema-registry - - kafka-rest-proxy - - # This "container" is a workaround to pre-create topics - kafka-setup: - build: - context: ../kafka - hostname: kafka-setup - container_name: kafka-setup - depends_on: - - broker - - schema-registry - environment: - - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - schema-registry: - image: confluentinc/cp-schema-registry:5.4.0 - hostname: schema-registry - container_name: schema-registry - depends_on: - - zookeeper - - broker - ports: - - "8081:8081" - environment: - SCHEMA_REGISTRY_HOST_NAME: schema-registry - SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181' - - schema-registry-ui: - image: landoop/schema-registry-ui:latest - container_name: schema-registry-ui - hostname: schema-registry-ui - ports: - - "8000:8000" - environment: - SCHEMAREGISTRY_URL: 'http://schema-registry:8081' - ALLOW_GLOBAL: 'true' - ALLOW_TRANSITIVE: 'true' - ALLOW_DELETION: 'true' - READONLY_MODE: 'true' - PROXY: 'true' - depends_on: - - schema-registry - - elasticsearch: - image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8 - container_name: elasticsearch - hostname: elasticsearch - ports: - - "9200:9200" - environment: - - discovery.type=single-node - - xpack.security.enabled=false - - "ES_JAVA_OPTS=-Xms1g -Xmx1g" - volumes: - - esdata:/usr/share/elasticsearch/data - - kibana: - image: docker.elastic.co/kibana/kibana:5.6.8 - container_name: kibana - hostname: kibana - ports: - - "5601:5601" - environment: - - SERVER_HOST=0.0.0.0 - - ELASTICSEARCH_URL=http://elasticsearch:9200 - depends_on: - - elasticsearch - - neo4j: - image: neo4j:3.5.7 - hostname: neo4j - container_name: neo4j - environment: - NEO4J_AUTH: 'neo4j/datahub' - ports: - - "7474:7474" - - "7687:7687" - volumes: - - neo4jdata:/data - - # This "container" is a workaround to pre-create search indices - elasticsearch-setup: - build: - context: ../elasticsearch - hostname: elasticsearch-setup - container_name: elasticsearch-setup - depends_on: - - elasticsearch - environment: - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - datahub-gms: - image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest} - hostname: datahub-gms - container_name: datahub-gms - ports: - - "8080:8080" - environment: - - EBEAN_DATASOURCE_USERNAME=datahub - - EBEAN_DATASOURCE_PASSWORD=datahub - - EBEAN_DATASOURCE_HOST=mysql:3306 - - EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8 - - EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - depends_on: - - elasticsearch-setup - - kafka-setup - - mysql - - neo4j - - datahub-frontend: - image: linkedin/datahub-frontend:${DATAHUB_VERSION:-latest} - hostname: datahub-frontend - container_name: datahub-frontend - ports: - - "9001:9001" - environment: - - DATAHUB_GMS_HOST=datahub-gms - - DATAHUB_GMS_PORT=8080 - - DATAHUB_SECRET=YouKnowNothing - - DATAHUB_APP_VERSION=1.0 - - DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB - depends_on: - - datahub-gms - - datahub-mae-consumer: - image: linkedin/datahub-mae-consumer:${DATAHUB_VERSION:-latest} - hostname: datahub-mae-consumer - container_name: datahub-mae-consumer - ports: - - "9091:9091" - environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - depends_on: - - kafka-setup - - elasticsearch-setup - - neo4j - command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \ - echo kafka-setup done! && /start.sh'" - - datahub-mce-consumer: - image: linkedin/datahub-mce-consumer:${DATAHUB_VERSION:-latest} - hostname: datahub-mce-consumer - container_name: datahub-mce-consumer - ports: - - "9090:9090" - environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - GMS_HOST=datahub-gms - - GMS_PORT=8080 - depends_on: - - kafka-setup - - datahub-gms - command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \ - echo kafka-setup done! && /start.sh'" - -networks: - default: - name: datahub_network - -volumes: - mysqldata: - esdata: - neo4jdata: - zkdata: diff --git a/docker/quickstart/quickstart.sh b/docker/quickstart/quickstart.sh deleted file mode 100755 index 7f4798e777..0000000000 --- a/docker/quickstart/quickstart.sh +++ /dev/null @@ -1,4 +0,0 @@ -#!/bin/bash - -DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" -cd $DIR && docker-compose pull && docker-compose -p datahub up --build \ No newline at end of file diff --git a/docker/rebuild-all/docker-compose.yml b/docker/rebuild-all/docker-compose.yml deleted file mode 100644 index be0762c1c7..0000000000 --- a/docker/rebuild-all/docker-compose.yml +++ /dev/null @@ -1,268 +0,0 @@ ---- -version: '3.5' -services: - mysql: - container_name: mysql - hostname: mysql - image: mysql:5.7 - restart: always - command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci - environment: - MYSQL_DATABASE: 'datahub' - MYSQL_USER: 'datahub' - MYSQL_PASSWORD: 'datahub' - MYSQL_ROOT_PASSWORD: 'datahub' - ports: - - "3306:3306" - volumes: - - ../mysql/init.sql:/docker-entrypoint-initdb.d/init.sql - - mysqldata:/var/lib/mysql - - zookeeper: - image: confluentinc/cp-zookeeper:5.4.0 - hostname: zookeeper - container_name: zookeeper - ports: - - "2181:2181" - environment: - ZOOKEEPER_CLIENT_PORT: 2181 - ZOOKEEPER_TICK_TIME: 2000 - volumes: - - zkdata:/var/opt/zookeeper - - broker: - image: confluentinc/cp-kafka:5.4.0 - hostname: broker - container_name: broker - depends_on: - - zookeeper - ports: - - "29092:29092" - - "9092:9092" - environment: - KAFKA_BROKER_ID: 1 - KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181' - KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT - KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092 - KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 - KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 - - kafka-rest-proxy: - image: confluentinc/cp-kafka-rest:5.4.0 - hostname: kafka-rest-proxy - container_name: kafka-rest-proxy - ports: - - "8082:8082" - environment: - KAFKA_REST_LISTENERS: http://0.0.0.0:8082/ - KAFKA_REST_SCHEMA_REGISTRY_URL: http://schema-registry:8081/ - KAFKA_REST_HOST_NAME: kafka-rest-proxy - KAFKA_REST_BOOTSTRAP_SERVERS: PLAINTEXT://broker:29092 - depends_on: - - zookeeper - - broker - - schema-registry - - kafka-topics-ui: - image: landoop/kafka-topics-ui:0.9.4 - hostname: kafka-topics-ui - container_name: kafka-topics-ui - ports: - - "18000:8000" - environment: - KAFKA_REST_PROXY_URL: "http://kafka-rest-proxy:8082/" - PROXY: "true" - depends_on: - - zookeeper - - broker - - schema-registry - - kafka-rest-proxy - - # This "container" is a workaround to pre-create topics - kafka-setup: - build: - context: ../kafka - hostname: kafka-setup - container_name: kafka-setup - depends_on: - - broker - - schema-registry - environment: - - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - schema-registry: - image: confluentinc/cp-schema-registry:5.4.0 - hostname: schema-registry - container_name: schema-registry - depends_on: - - zookeeper - - broker - ports: - - "8081:8081" - environment: - SCHEMA_REGISTRY_HOST_NAME: schema-registry - SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181' - - schema-registry-ui: - image: landoop/schema-registry-ui:latest - container_name: schema-registry-ui - hostname: schema-registry-ui - ports: - - "8000:8000" - environment: - SCHEMAREGISTRY_URL: 'http://schema-registry:8081' - ALLOW_GLOBAL: 'true' - ALLOW_TRANSITIVE: 'true' - ALLOW_DELETION: 'true' - READONLY_MODE: 'true' - PROXY: 'true' - depends_on: - - schema-registry - - elasticsearch: - image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8 - container_name: elasticsearch - hostname: elasticsearch - ports: - - "9200:9200" - environment: - - discovery.type=single-node - - xpack.security.enabled=false - - "ES_JAVA_OPTS=-Xms1g -Xmx1g" - volumes: - - esdata:/usr/share/elasticsearch/data - - kibana: - image: docker.elastic.co/kibana/kibana:5.6.8 - container_name: kibana - hostname: kibana - ports: - - "5601:5601" - environment: - - SERVER_HOST=0.0.0.0 - - ELASTICSEARCH_URL=http://elasticsearch:9200 - depends_on: - - elasticsearch - - neo4j: - image: neo4j:3.5.7 - hostname: neo4j - container_name: neo4j - environment: - NEO4J_AUTH: 'neo4j/datahub' - ports: - - "7474:7474" - - "7687:7687" - volumes: - - neo4jdata:/data - - # This "container" is a workaround to pre-create search indices - elasticsearch-setup: - build: - context: ../elasticsearch - hostname: elasticsearch-setup - container_name: elasticsearch-setup - depends_on: - - elasticsearch - environment: - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - datahub-gms: - build: - context: ../../ - dockerfile: docker/gms/Dockerfile - hostname: datahub-gms - container_name: datahub-gms - ports: - - "8080:8080" - environment: - - EBEAN_DATASOURCE_USERNAME=datahub - - EBEAN_DATASOURCE_PASSWORD=datahub - - EBEAN_DATASOURCE_HOST=mysql:3306 - - EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8 - - EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - depends_on: - - elasticsearch-setup - - kafka-setup - - mysql - - neo4j - - datahub-frontend: - build: - context: ../../ - dockerfile: docker/frontend/Dockerfile - hostname: datahub-frontend - container_name: datahub-frontend - ports: - - "9001:9001" - environment: - - DATAHUB_GMS_HOST=datahub-gms - - DATAHUB_GMS_PORT=8080 - - DATAHUB_SECRET=YouKnowNothing - - DATAHUB_APP_VERSION=1.0 - - DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB - depends_on: - - datahub-gms - - datahub-mae-consumer: - build: - context: ../../ - dockerfile: docker/mae-consumer/Dockerfile - hostname: datahub-mae-consumer - container_name: datahub-mae-consumer - ports: - - "9091:9091" - environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - ELASTICSEARCH_HOST=elasticsearch - - ELASTICSEARCH_PORT=9200 - - NEO4J_HOST=neo4j:7474 - - NEO4J_URI=bolt://neo4j - - NEO4J_USERNAME=neo4j - - NEO4J_PASSWORD=datahub - depends_on: - - kafka-setup - - elasticsearch-setup - - neo4j - command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \ - echo kafka-setup done! && /start.sh'" - - datahub-mce-consumer: - build: - context: ../../ - dockerfile: docker/mce-consumer/Dockerfile - hostname: datahub-mce-consumer - container_name: datahub-mce-consumer - ports: - - "9090:9090" - environment: - - KAFKA_BOOTSTRAP_SERVER=broker:29092 - - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081 - - GMS_HOST=datahub-gms - - GMS_PORT=8080 - depends_on: - - kafka-setup - - datahub-gms - command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \ - echo kafka-setup done! && /start.sh'" - -networks: - default: - name: datahub_network - -volumes: - mysqldata: - esdata: - neo4jdata: - zkdata: diff --git a/docker/rebuild-all/rebuild-all.sh b/docker/rebuild-all/rebuild-all.sh deleted file mode 100755 index 552c9d76fc..0000000000 --- a/docker/rebuild-all/rebuild-all.sh +++ /dev/null @@ -1,4 +0,0 @@ -#!/bin/bash - -DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" -cd $DIR && docker-compose pull && docker-compose -p datahub up \ No newline at end of file diff --git a/docker/schema-registry-ui/env/docker.env b/docker/schema-registry-ui/env/docker.env new file mode 100644 index 0000000000..6e40a79fe7 --- /dev/null +++ b/docker/schema-registry-ui/env/docker.env @@ -0,0 +1,6 @@ +SCHEMAREGISTRY_URL=http://schema-registry:8081 +ALLOW_GLOBAL=true +ALLOW_TRANSITIVE=true +ALLOW_DELETION=true +READONLY_MODE=true +PROXY=true diff --git a/docker/schema-registry/env/docker.env b/docker/schema-registry/env/docker.env new file mode 100644 index 0000000000..166c551ac1 --- /dev/null +++ b/docker/schema-registry/env/docker.env @@ -0,0 +1,2 @@ +SCHEMA_REGISTRY_HOST_NAME=schemaregistry +SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=zookeeper:2181 diff --git a/docker/zookeeper/env/docker.env b/docker/zookeeper/env/docker.env new file mode 100644 index 0000000000..1b8f605f98 --- /dev/null +++ b/docker/zookeeper/env/docker.env @@ -0,0 +1,2 @@ +ZOOKEEPER_CLIENT_PORT=2181 +ZOOKEEPER_TICK_TIME=2000 diff --git a/docs/docker/README.md b/docs/docker/README.md new file mode 100644 index 0000000000..4a99490a2b --- /dev/null +++ b/docs/docker/README.md @@ -0,0 +1 @@ +See [docker/README.md](../../docker/README.md). \ No newline at end of file diff --git a/docs/docker/development.md b/docs/docker/development.md new file mode 100644 index 0000000000..2122a8e37f --- /dev/null +++ b/docs/docker/development.md @@ -0,0 +1,70 @@ +# Using Docker Images During Development + +We've created a special `docker-compose.dev.yml` override file that should configure docker images to be easier to use +during development. + +Normally, you'd rebuild your images from scratch with `docker-compose build` (or `docker-compose up --build`). However, +this takes way too long for development. It has to copy the entire repo to each image and rebuild it there. + +The `docker-compose.dev.yml` file bypasses this problem by mounting binaries, startup scripts, and other data to +special, slimmed down images (of which the Dockerfile is usually defined in `/debug/Dockerfile`). Mounts work +both ways, so they should also try to mount log directories on the container, so that they are easy to read on your +local machine without needing to inspect the running container (especially if the app crashes and the container stops!). + +We highly recommend you just invoke the `docker/dev.sh` script we've included. It is pretty small if you want to read it +to see what it does, but it ends up using our `docker-compose.dev.yml` file. + +## Debugging + +The default dev images, while set up to use your local code, do not enable debugging by default. To enable debugging, +you need to make two small edits (don't check these changes in!). + +- Add the JVM debug flags to the environment file for the service. +- Assign the port in the docker-compose file. + +For example, to debug `dathaub-gms`: + +``` +# Add this line to docker/datahub-gms/env/dev.env. You can change the port and/or change suspend=n to y. +JAVA_TOOL_OPTIONS -agentlib:jdwp=transport=dt_socket,address=5005,server=y,suspend=n +``` + +``` +# Change the definition in docker/docker-compose.dev.yml to this + datahub-gms: + image: linkedin/datahub-gms:debug + build: + context: datahub-gms/debug + dockerfile: Dockerfile + ports: # <--- Add this line + - "5005:5005" # <--- And this line. Must match port from environment file. + volumes: + - ./datahub-gms/start.sh:/datahub/datahub-gms/scripts/start.sh + - ../gms/war/build/libs/:/datahub/datahub-gms/bin +``` + +## Tips for People New To Docker + +### Conflicting containers + +If you ran `docker/quickstart.sh` before, your machine may already have a container for DataHub. If you want to run +`docker/dev.sh` instead, ensure that the old container is removed by running `docker container prune`. The opposite also +applies. + +> Note this only removes containers, not images. Should still be fast to switch between these once you've launched both +> at least once. + +### Running a specific service + +`docker-compose up` will launch all services in the configuration, including dependencies, unless they're already +running. If you, for some reason, wish to change this behavior, check out these example commands. + +``` +docker-compose -p datahub -f docker-compose.yml -f docker-compose.overrides.yml -f docker-compose.dev.yml up datahub-gms +``` +Will only start `datahub-gms` and its dependencies. + +``` +docker-compose -p datahub -f docker-compose.yml -f docker-compose.overrides.yml -f docker-compose.dev.yml up --no-deps datahub-gms +``` +Will only start `datahub-gms`, without dependencies.