refactor(docker): make docker files easier to use during development. (#1777)

* Make docker files easier to use during development.

During development it quite nice to have docker work with locally built code. This allows you to launch all services very quickly, with your changes, and optionally with debugging support.

Changes made to docker files:
- Removed all redundant docker-compose files. We now have 1 giant file, and smaller files to use as overrides.
- Remove redundant README files that provided little information.
- Rename docker/<dir> to match the service name in the docker-compose file for clarity.
- Move environment variables to .env files. We only provide dev / the default environment for quickstart.
- Add debug options to docker files using multistage build to build minimal images with the idea that built files will be mounted instead.
- Add a docker/dev.sh script + compose file to easily use the dev override images (separate tag; images never published; uses debug docker files; mounts binaries to image).
- Added docs/docker documentation for this.
This commit is contained in:
John Plaisted 2020-08-06 16:38:53 -07:00 committed by GitHub
parent 43dfce8b2f
commit b8e18b0b5d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
84 changed files with 699 additions and 1434 deletions

View File

@ -21,6 +21,8 @@ jobs:
echo "tag=$TAG"
echo "::set-output name=tag::$TAG"
- uses: docker/build-push-action@v1
env:
DOCKER_BUILDKIT: 1
with:
dockerfile: ./docker/frontend/Dockerfile
username: ${{ secrets.DOCKER_USERNAME }}

View File

@ -21,6 +21,8 @@ jobs:
echo "tag=$TAG"
echo "::set-output name=tag::$TAG"
- uses: docker/build-push-action@v1
env:
DOCKER_BUILDKIT: 1
with:
dockerfile: ./docker/gms/Dockerfile
username: ${{ secrets.DOCKER_USERNAME }}

View File

@ -21,6 +21,8 @@ jobs:
echo "tag=$TAG"
echo "::set-output name=tag::$TAG"
- uses: docker/build-push-action@v1
env:
DOCKER_BUILDKIT: 1
with:
dockerfile: ./docker/mae-consumer/Dockerfile
username: ${{ secrets.DOCKER_USERNAME }}

View File

@ -21,6 +21,8 @@ jobs:
echo "tag=$TAG"
echo "::set-output name=tag::$TAG"
- uses: docker/build-push-action@v1
env:
DOCKER_BUILDKIT: 1
with:
dockerfile: ./docker/mce-consumer/Dockerfile
username: ${{ secrets.DOCKER_USERNAME }}

3
.gitignore vendored
View File

@ -17,11 +17,8 @@ metadata-events/mxe-registration/src/main/resources/**/*.avsc
.java-version
# Python
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
.mypy_cache/

View File

@ -1,27 +1,56 @@
# Docker Images
## Prerequisites
You need to install [docker](https://docs.docker.com/install/) and
[docker-compose](https://docs.docker.com/compose/install/) (if using Linux; on Windows and Mac compose is included with
Docker Desktop).
Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap
area.
## Quickstart
The easiest way to bring up and test DataHub is using DataHub [Docker](https://www.docker.com) images
which are continuously deployed to [Docker Hub](https://hub.docker.com/u/linkedin) with every commit to repository.
You can easily download and run all these images and their dependencies with our
[quick start guide](../docs/quickstart.md).
DataHub Docker Images:
* [linkedin/datahub-gms](https://cloud.docker.com/repository/docker/linkedin/datahub-gms/)
* [linkedin/datahub-frontend](https://cloud.docker.com/repository/docker/linkedin/datahub-frontend/)
* [linkedin/datahub-mae-consumer](https://cloud.docker.com/repository/docker/linkedin/datahub-mae-consumer/)
* [linkedin/datahub-mce-consumer](https://cloud.docker.com/repository/docker/linkedin/datahub-mce-consumer/)
Above Docker images are created for DataHub specific use. You can check subdirectories to check how those images are
generated via [Dockerbuild](https://docs.docker.com/engine/reference/commandline/build/) files or
how to start each container using [Docker Compose](https://docs.docker.com/compose/). Other than these, DataHub depends
on below Docker images to be able to run:
Dependencies:
* [**Kafka and Schema Registry**](kafka)
* [**Elasticsearch**](elasticsearch)
* [**Elasticsearch**](elasticsearch-setup)
* [**MySQL**](mysql)
Local-built ingestion image allows you to create on an ad-hoc basis `metadatachangeevent` with Python script.
The pipeline depends on all the above images composing up.
* [**Ingestion**](ingestion)
### Ingesting demo data.
## Prerequisites
You need to install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/).
If you want to test ingesting some data once DataHub is up, see [**Ingestion**](ingestion/README.md).
## Quickstart
If you want to quickly try and evaluate DataHub by running all necessary Docker containers, you can check
[Quickstart Guide](quickstart).
## Using Docker Images During Development
See [Using Docker Images During Development](../docs/docker/development.md).
## Building And Deploying Docker Images
We use GitHub actions to build and continuously deploy our images. There should be no need to do this manually; a
successful release on Github will automatically publish the images.
### Building images
To build the full images (that we are going to publish), you need to run the following:
```
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub build
```
This is because we're relying on builtkit for multistage builds. It does not hurt also set `DATAHUB_VERSION` to
something unique.
This is not our recommended development flow and most developers should be following the
[Using Docker Images During Development](#using-docker-images-during-development) guide.

6
docker/broker/env/docker.env vendored Normal file
View File

@ -0,0 +1,6 @@
KAFKA_BROKER_ID=1
KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0

View File

@ -0,0 +1,16 @@
# DataHub Frontend Docker Image
[![datahub-frontend docker](https://github.com/linkedin/datahub/workflows/datahub-frontend%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-frontend+docker%22)
Refer to [DataHub Frontend Service](../../datahub-frontend) to have a quick understanding of the architecture and
responsibility of this service for the DataHub.
## Checking out DataHub UI
After starting your Docker container, you can connect to it by typing below into your favorite web browser:
```
http://localhost:9001
```
You can sign in with `datahub` as username and password.

View File

@ -0,0 +1,5 @@
DATAHUB_GMS_HOST=datahub-gms
DATAHUB_GMS_PORT=8080
DATAHUB_SECRET=YouKnowNothing
DATAHUB_APP_VERSION=1.0
DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB

View File

@ -0,0 +1,28 @@
# Defining environment
ARG APP_ENV=prod
FROM openjdk:8-jre-alpine as base
ENV DOCKERIZE_VERSION v0.6.1
RUN apk --no-cache add curl tar \
&& curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.20.v20190813/jetty-runner-9.4.20.v20190813.jar --output jetty-runner.jar \
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
FROM openjdk:8 as prod-build
COPY . /datahub-src
RUN cd /datahub-src && ./gradlew :gms:war:build
RUN cp /datahub-src/gms/war/build/libs/war.war /war.war
FROM base as prod-install
COPY --from=prod-build /war.war /datahub/datahub-gms/bin/war.war
COPY --from=prod-build /datahub-src/docker/datahub-gms/start.sh /datahub/datahub-gms/scripts/start.sh
RUN chmod +x /datahub/datahub-gms/scripts/start.sh
FROM base as dev-install
# Dummy stage for development. Assumes code is built on your machine and mounted to this image.
# See this excellent thread https://github.com/docker/cli/issues/1134
FROM ${APP_ENV}-install as final
EXPOSE 8080
CMD /datahub/datahub-gms/scripts/start.sh

View File

@ -0,0 +1,22 @@
# DataHub Generalized Metadata Store (GMS) Docker Image
[![datahub-gms docker](https://github.com/linkedin/datahub/workflows/datahub-gms%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-gms+docker%22)
Refer to [DataHub GMS Service](../../gms) to have a quick understanding of the architecture and
responsibility of this service for the DataHub.
## Other Database Platforms
While GMS defaults to using MySQL as its storage backend, it is possible to switch to any of the
[database platforms](https://ebean.io/docs/database/) supported by Ebean.
For example, you can run the following command to start a GMS that connects to a PostgreSQL backend.
```
(cd docker/ && docker-compose -f docker-compose.yml -f docker-compose.postgre.yml -p datahub up)
```
or a MariaDB backend
```
(cd docker/ && docker-compose -f docker-compose.yml -f docker-compose.mariadb.yml -p datahub up)
```

13
docker/datahub-gms/env/docker.env vendored Normal file
View File

@ -0,0 +1,13 @@
EBEAN_DATASOURCE_USERNAME=datahub
EBEAN_DATASOURCE_PASSWORD=datahub
EBEAN_DATASOURCE_HOST=mysql:3306
EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8
EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
KAFKA_BOOTSTRAP_SERVER=broker:29092
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
ELASTICSEARCH_HOST=elasticsearch
ELASTICSEARCH_PORT=9200
NEO4J_HOST=neo4j:7474
NEO4J_URI=bolt://neo4j
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=datahub

View File

@ -0,0 +1,13 @@
EBEAN_DATASOURCE_USERNAME=datahub
EBEAN_DATASOURCE_PASSWORD=datahub
EBEAN_DATASOURCE_HOST=mariadb:3306
EBEAN_DATASOURCE_URL=jdbc:mariadb://mariadb:3306/datahub
EBEAN_DATASOURCE_DRIVER=org.mariadb.jdbc.Driver
KAFKA_BOOTSTRAP_SERVER=broker:29092
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
ELASTICSEARCH_HOST=elasticsearch
ELASTICSEARCH_PORT=9200
NEO4J_HOST=neo4j:7474
NEO4J_URI=bolt://neo4j
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=datahub

View File

@ -0,0 +1,13 @@
EBEAN_DATASOURCE_USERNAME=datahub
EBEAN_DATASOURCE_PASSWORD=datahub
EBEAN_DATASOURCE_HOST=postgres:5432
EBEAN_DATASOURCE_URL=jdbc:postgresql://postgres:5432/datahub
EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver
KAFKA_BOOTSTRAP_SERVER=broker:29092
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
ELASTICSEARCH_HOST=elasticsearch
ELASTICSEARCH_PORT=9200
NEO4J_HOST=neo4j:7474
NEO4J_URI=bolt://neo4j
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=datahub

2
docker/gms/start.sh → docker/datahub-gms/start.sh Normal file → Executable file
View File

@ -6,4 +6,4 @@ dockerize \
-wait http://$ELASTICSEARCH_HOST:$ELASTICSEARCH_PORT \
-wait http://$NEO4J_HOST \
-timeout 240s \
java -jar jetty-runner.jar gms.war
java -jar /jetty-runner.jar /datahub/datahub-gms/bin/war.war

View File

@ -0,0 +1,27 @@
# Defining environment
ARG APP_ENV=prod
FROM openjdk:8-jre-alpine as base
ENV DOCKERIZE_VERSION v0.6.1
RUN apk --no-cache add curl tar \
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
FROM openjdk:8 as prod-build
COPY . datahub-src
RUN cd datahub-src && ./gradlew :metadata-jobs:mae-consumer-job:build
RUN cd datahub-src && cp metadata-jobs/mae-consumer-job/build/libs/mae-consumer-job.jar ../mae-consumer-job.jar
FROM base as prod-install
COPY --from=prod-build /mae-consumer-job.jar /datahub/datahub-mae-consumer/bin/
COPY --from=prod-build /datahub-src/docker/datahub-mae-consumer/start.sh /datahub/datahub-mae-consumer/scripts/
RUN chmod +x /datahub/datahub-mae-consumer/scripts/start.sh
FROM base as dev-install
# Dummy stage for development. Assumes code is built on your machine and mounted to this image.
# See this excellent thread https://github.com/docker/cli/issues/1134
FROM ${APP_ENV}-install as final
EXPOSE 9090
CMD /datahub/datahub-mae-consumer/scripts/start.sh

View File

@ -0,0 +1,5 @@
# DataHub MetadataAuditEvent (MAE) Consumer Docker Image
[![datahub-mae-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mae-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mae-consumer+docker%22)
Refer to [DataHub MAE Consumer Job](../../metadata-jobs/mae-consumer-job) to have a quick understanding of the architecture and
responsibility of this service for the DataHub.

View File

@ -0,0 +1,8 @@
KAFKA_BOOTSTRAP_SERVER=broker:29092
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
ELASTICSEARCH_HOST=elasticsearch
ELASTICSEARCH_PORT=9200
NEO4J_HOST=neo4j:7474
NEO4J_URI=bolt://neo4j
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=datahub

View File

@ -5,4 +5,4 @@ dockerize \
-wait http://$ELASTICSEARCH_HOST:$ELASTICSEARCH_PORT \
-wait http://$NEO4J_HOST \
-timeout 240s \
java -jar mae-consumer-job.jar
java -jar /datahub/datahub-mae-consumer/bin/mae-consumer-job.jar

View File

@ -0,0 +1,27 @@
# Defining environment
ARG APP_ENV=prod
FROM openjdk:8-jre-alpine as base
ENV DOCKERIZE_VERSION v0.6.1
RUN apk --no-cache add curl tar \
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
FROM openjdk:8 as prod-build
COPY . datahub-src
RUN cd datahub-src && ./gradlew :metadata-jobs:mce-consumer-job:build
RUN cd datahub-src && cp metadata-jobs/mce-consumer-job/build/libs/mce-consumer-job.jar ../mce-consumer-job.jar
FROM base as prod-install
COPY --from=prod-build /mce-consumer-job.jar /datahub/datahub-mce-consumer/bin/
COPY --from=prod-build /datahub-src/docker/datahub-mce-consumer/start.sh /datahub/datahub-mce-consumer/scripts/
RUN chmod +x /datahub/datahub-mce-consumer/scripts/start.sh
FROM base as dev-install
# Dummy stage for development. Assumes code is built on your machine and mounted to this image.
# See this excellent thread https://github.com/docker/cli/issues/1134
FROM ${APP_ENV}-install as final
EXPOSE 9090
CMD /datahub/datahub-mce-consumer/scripts/start.sh

View File

@ -0,0 +1,5 @@
# DataHub MetadataChangeEvent (MCE) Consumer Docker Image
[![datahub-mce-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mce-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mce-consumer+docker%22)
Refer to [DataHub MCE Consumer Job](../../metadata-jobs/mce-consumer-job) to have a quick understanding of the architecture and
responsibility of this service for the DataHub.

View File

@ -0,0 +1,4 @@
KAFKA_BOOTSTRAP_SERVER=broker:29092
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
GMS_HOST=datahub-gms
GMS_PORT=8080

View File

@ -4,4 +4,4 @@
dockerize \
-wait tcp://$KAFKA_BOOTSTRAP_SERVER \
-timeout 240s \
java -jar mce-consumer-job.jar
java -jar /datahub/datahub-mce-consumer/bin/mce-consumer-job.jar

17
docker/dev.sh Executable file
View File

@ -0,0 +1,17 @@
#!/bin/bash
# Launches dev instances of DataHub images. See documentation for more details.
# YOU MUST BUILD VIA GRADLE BEFORE RUNNING THIS.
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd $DIR && \
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose \
-f docker-compose.yml \
-f docker-compose.override.yml \
-f docker-compose.dev.yml \
pull \
&& \
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub \
-f docker-compose.yml \
-f docker-compose.override.yml \
-f docker-compose.dev.yml \
up

View File

@ -0,0 +1,45 @@
# Default overrides for running local development.
# Images here are made as "development" images by following the general pattern of defining a multistage build with
# separate prod/dev steps; using APP_ENV to specify which to use. The dev steps should avoid building and instead assume
# that binaries and scripts will be mounted to the image, as also set up by this file. Also see see this excellent
# thread https://github.com/docker/cli/issues/1134.
# To make a JVM app debuggable via IntelliJ, go to its env file and add JVM debug flags, and then add the JVM debug
# port to this file.
---
# TODO mount + debug docker file for frontend
version: '3.8'
services:
datahub-gms:
image: linkedin/datahub-gms:debug
build:
context: datahub-gms
dockerfile: Dockerfile
args:
APP_ENV: dev
volumes:
- ./datahub-gms/start.sh:/datahub/datahub-gms/scripts/start.sh
- ../gms/war/build/libs/:/datahub/datahub-gms/bin
datahub-mae-consumer:
image: linkedin/datahub-mae-consumer:debug
build:
context: datahub-mae-consumer
dockerfile: Dockerfile
args:
APP_ENV: dev
volumes:
- ./datahub-mae-consumer/start.sh:/datahub/datahub-mae-consumer/scripts/start.sh
- ../metadata-jobs/mae-consumer-job/build/libs/:/datahub/datahub-mae-consumer/bin/
datahub-mce-consumer:
image: linkedin/datahub-mce-consumer:debug
build:
context: datahub-mce-consumer
dockerfile: Dockerfile
args:
APP_ENV: dev
volumes:
- ./datahub-mce-consumer/start.sh:/datahub/datahub-mce-consumer/scripts/start.sh
- ../metadata-jobs/mce-consumer-job/build/libs/:/datahub/datahub-mce-consumer/bin

View File

@ -0,0 +1,24 @@
# Default override to use MySQL as a backing store for datahub-gms (same as docker-compose.mysql.yml).
---
version: '3.8'
services:
mysql:
container_name: mysql
hostname: mysql
image: mysql:5.7
env_file: mysql/env/docker.env
restart: always
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
ports:
- "3306:3306"
volumes:
- ./mysql/init.sql:/docker-entrypoint-initdb.d/init.sql
- mysqldata:/var/lib/mysql
datahub-gms:
env_file: datahub-gms/env/docker.env
depends_on:
- mysql
volumes:
mysqldata:

192
docker/docker-compose.yml Normal file
View File

@ -0,0 +1,192 @@
# Docker compose file covering DataHub's default configuration, which is to run all containers on a single host.
# Please see the README.md for instructions as to how to use and customize.
# NOTE: This file will cannot build! No dockerfiles are set. See the README.md in this directory.
---
version: '3.8'
services:
zookeeper:
image: confluentinc/cp-zookeeper:5.4.0
env_file: zookeeper/env/docker.env
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
volumes:
- zkdata:/var/opt/zookeeper
broker:
image: confluentinc/cp-kafka:5.4.0
env_file: broker/env/docker.env
hostname: broker
container_name: broker
depends_on:
- zookeeper
ports:
- "29092:29092"
- "9092:9092"
kafka-rest-proxy:
image: confluentinc/cp-kafka-rest:5.4.0
env_file: kafka-rest-proxy/env/docker.env
hostname: kafka-rest-proxy
container_name: kafka-rest-proxy
ports:
- "8082:8082"
depends_on:
- zookeeper
- broker
- schema-registry
kafka-topics-ui:
image: landoop/kafka-topics-ui:0.9.4
env_file: kafka-topics-ui/env/docker.env
hostname: kafka-topics-ui
container_name: kafka-topics-ui
ports:
- "18000:8000"
depends_on:
- zookeeper
- broker
- schema-registry
- kafka-rest-proxy
# This "container" is a workaround to pre-create topics
kafka-setup:
build:
context: kafka-setup
env_file: kafka-setup/env/docker.env
hostname: kafka-setup
container_name: kafka-setup
depends_on:
- broker
- schema-registry
schema-registry:
image: confluentinc/cp-schema-registry:5.4.0
env_file: schema-registry/env/docker.env
hostname: schema-registry
container_name: schema-registry
depends_on:
- zookeeper
- broker
ports:
- "8081:8081"
schema-registry-ui:
image: landoop/schema-registry-ui:latest
env_file: schema-registry-ui/env/docker.env
container_name: schema-registry-ui
hostname: schema-registry-ui
ports:
- "8000:8000"
depends_on:
- schema-registry
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8
env_file: elasticsearch/env/docker.env
container_name: elasticsearch
hostname: elasticsearch
ports:
- "9200:9200"
volumes:
- esdata:/usr/share/elasticsearch/data
kibana:
image: docker.elastic.co/kibana/kibana:5.6.8
env_file: kibana/env/docker.env
container_name: kibana
hostname: kibana
ports:
- "5601:5601"
depends_on:
- elasticsearch
neo4j:
image: neo4j:3.5.7
env_file: neo4j/env/docker.env
hostname: neo4j
container_name: neo4j
ports:
- "7474:7474"
- "7687:7687"
volumes:
- neo4jdata:/data
# This "container" is a workaround to pre-create search indices
elasticsearch-setup:
build:
context: elasticsearch-setup
env_file: elasticsearch-setup/env/docker.env
hostname: elasticsearch-setup
container_name: elasticsearch-setup
depends_on:
- elasticsearch
datahub-gms:
build:
context: ../
dockerfile: docker/datahub-gms/Dockerfile
image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest}
hostname: datahub-gms
container_name: datahub-gms
ports:
- "8080:8080"
depends_on:
- elasticsearch-setup
- kafka-setup
- mysql
- neo4j
datahub-frontend:
build:
context: ../
dockerfile: docker/datahub-frontend/Dockerfile
image: linkedin/datahub-frontend:${DATAHUB_VERSION:-latest}
env_file: datahub-frontend/env/docker.env
hostname: datahub-frontend
container_name: datahub-frontend
ports:
- "9001:9001"
depends_on:
- datahub-gms
datahub-mae-consumer:
build:
context: ../
dockerfile: docker/datahub-mae-consumer/Dockerfile
image: linkedin/datahub-mae-consumer:${DATAHUB_VERSION:-latest}
env_file: datahub-mae-consumer/env/docker.env
hostname: datahub-mae-consumer
container_name: datahub-mae-consumer
ports:
- "9091:9091"
depends_on:
- kafka-setup
- elasticsearch-setup
- neo4j
datahub-mce-consumer:
build:
context: ../
dockerfile: docker/datahub-mce-consumer/Dockerfile
image: linkedin/datahub-mce-consumer:${DATAHUB_VERSION:-latest}
env_file: datahub-mce-consumer/env/docker.env
hostname: datahub-mce-consumer
container_name: datahub-mce-consumer
ports:
- "9090:9090"
depends_on:
- kafka-setup
- datahub-gms
networks:
default:
name: datahub_network
volumes:
esdata:
neo4jdata:
zkdata:

View File

@ -0,0 +1,5 @@
# Elasticsearch & Kibana
DataHub uses Elasticsearch as a search engine. Elasticsearch powers search, typeahead and browse functions for DataHub.
[Official Elasticsearch Docker image](https://hub.docker.com/_/elasticsearch) found in Docker Hub is used without
any modification.

View File

@ -0,0 +1,2 @@
ELASTICSEARCH_HOST=elasticsearch
ELASTICSEARCH_PORT=9200

View File

@ -1,35 +0,0 @@
# Elasticsearch & Kibana
DataHub uses Elasticsearch as a search engine. Elasticsearch powers search, typeahead and browse functions for DataHub.
[Official Elasticsearch Docker image](https://hub.docker.com/_/elasticsearch) found in Docker Hub is used without
any modification.
## Run Docker container
Below command will start the Elasticsearch and Kibana containers. `DataHub` uses Elasticsearch release `5.6.8`. Newer
versions of Elasticsearch are not tested and you might experience compatibility issues.
```
cd docker/elasticsearch && docker-compose pull && docker-compose up --build
```
You can connect to Kibana on your web browser to monitor Elasticsearch via below link:
```
http://localhost:5601
```
## Container configuration
### External Port
If you need to configure default configurations for your container such as the exposed port, you will do that in
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
how to change your exposed port settings.
```
ports:
- "9200:9200"
```
### Docker Network
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
If you change this, you will need to change this for all other Docker containers as well.
```
networks:
default:
name: datahub_network
```

View File

@ -1,38 +0,0 @@
---
version: '3.5'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8
container_name: elasticsearch
hostname: elasticsearch
ports:
- "9200:9200"
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
kibana:
image: docker.elastic.co/kibana/kibana:5.6.8
container_name: kibana
hostname: kibana
ports:
- "5601:5601"
depends_on:
- elasticsearch
# This "container" is a workaround to pre-create search indices
elasticsearch-setup:
build:
context: .
hostname: elasticsearch-setup
container_name: elasticsearch-setup
depends_on:
- elasticsearch
environment:
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
networks:
default:
name: datahub_network

3
docker/elasticsearch/env/docker.env vendored Normal file
View File

@ -0,0 +1,3 @@
discovery.type=single-node
xpack.security.enabled=false
ES_JAVA_OPTS=-Xms1g -Xmx1g

View File

@ -1,50 +0,0 @@
# DataHub Frontend Docker Image
[![datahub-frontend docker](https://github.com/linkedin/datahub/workflows/datahub-frontend%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-frontend+docker%22)
Refer to [DataHub Frontend Service](../../datahub-frontend) to have a quick understanding of the architecture and
responsibility of this service for the DataHub.
## Build & Run
```
cd docker/frontend && docker-compose up --build
```
This command will rebuild the docker image and start a container based on the image.
To start a container using an existing image, run the same command without the `--build` flag.
### Container configuration
#### External Port
If you need to configure default configurations for your container such as the exposed port, you will do that in
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
how to change your exposed port settings.
```
ports:
- "9001:9001"
```
#### Docker Network
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
If you change this, you will need to change this for all other Docker containers as well.
```
networks:
default:
name: datahub_network
```
#### datahub-gms Container
Before starting `datahub-frontend` container, `datahub-gms` container should already be up and running.
`datahub-frontend` service creates a connection to `datahub-gms` service and this is configured with environment
variables in `docker-compose.yml`:
```
environment:
- DATAHUB_GMS_HOST=datahub-gms
- DATAHUB_GMS_PORT=8080
```
The value of `DATAHUB_GMS_HOST` variable should be set to the host name of the `datahub-gms` container within the Docker network.
## Checking out DataHub UI
After starting your Docker container, you can connect to it by typing below into your favorite web browser:
```
http://localhost:9001
```
You can sign in with `datahub` as username and password.

View File

@ -1,22 +0,0 @@
---
version: '3.5'
services:
datahub-frontend:
image: linkedin/datahub-frontend:${DATAHUB_VERSION:-latest}
build:
context: ../../
dockerfile: docker/frontend/Dockerfile
hostname: datahub-frontend
container_name: datahub-frontend
ports:
- "9001:9001"
environment:
- DATAHUB_GMS_HOST=datahub-gms
- DATAHUB_GMS_PORT=8080
- DATAHUB_SECRET=YouKnowNothing
- DATAHUB_APP_VERSION=1.0
- DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB
networks:
default:
name: datahub_network

View File

@ -1,19 +0,0 @@
FROM openjdk:8 as builder
COPY . /datahub-src
RUN cd /datahub-src && ./gradlew :gms:war:build \
&& cp gms/war/build/libs/war.war /gms.war
FROM openjdk:8-jre-alpine
ENV DOCKERIZE_VERSION v0.6.1
RUN apk --no-cache add curl tar \
&& curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.20.v20190813/jetty-runner-9.4.20.v20190813.jar --output jetty-runner.jar \
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
COPY --from=builder /gms.war .
COPY docker/gms/start.sh /start.sh
RUN chmod +x /start.sh
EXPOSE 8080
CMD /start.sh

View File

@ -1,82 +0,0 @@
# DataHub Generalized Metadata Store (GMS) Docker Image
[![datahub-gms docker](https://github.com/linkedin/datahub/workflows/datahub-gms%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-gms+docker%22)
Refer to [DataHub GMS Service](../../gms) to have a quick understanding of the architecture and
responsibility of this service for the DataHub.
## Build & Run
```
cd docker/gms && docker-compose up --build
```
This command will rebuild the local docker image and start a container based on the image.
To start a container using an existing image, run the same command without the `--build` flag.
### Container configuration
#### External Port
If you need to configure default configurations for your container such as the exposed port, you will do that in
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
how to change your exposed port settings.
```
ports:
- "8080:8080"
```
#### Docker Network
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
If you change this, you will need to change this for all other Docker containers as well.
```
networks:
default:
name: datahub_network
```
#### MySQL, Elasticsearch and Kafka Containers
Before starting `datahub-gms` container, `mysql`, `elasticsearch`, `neo4j` and `kafka` containers should already be up and running.
These connections are configured via environment variables in `docker-compose.yml`:
```
environment:
- EBEAN_DATASOURCE_USERNAME=datahub
- EBEAN_DATASOURCE_PASSWORD=datahub
- EBEAN_DATASOURCE_HOST=mysql:3306
- EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub
- EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
```
The value of `EBEAN_DATASOURCE_HOST` variable should be set to the host name of the `mysql` container within the Docker network.
```
environment:
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
```
The value of `KAFKA_BOOTSTRAP_SERVER` variable should be set to the host name of the `kafka broker` container within the Docker network.
The value of `KAFKA_SCHEMAREGISTRY_URL` variable should be set to the host name of the `kafka schema registry` container within the Docker network.
```
environment:
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
```
The value of `ELASTICSEARCH_HOST` variable should be set to the host name of the `elasticsearch` container within the Docker network.
```
environment:
- NEO4J_HOST=neo4j:7474
- NEO4J_URI=bolt://neo4j
- NEO4J_USERNAME=neo4j
- NEO4J_PASSWORD=datahub
```
The value of `NEO4J_URI` variable should be set to the host name of the `neo4j` container within the Docker network.
## Other Database Platforms
While GMS defaults to using MySQL as its storage backend, it is possible to switch to any of the
[database platforms](https://ebean.io/docs/database/) supported by Ebean.
For example, you can run the following command to start a GMS that connects to a PostgreSQL backend
```
cd docker/gms && docker-compose -f docker-compose-postgres.yml up --build
```
or a MariaDB backend
```
cd docker/gms && docker-compose -f docker-compose-mariadb.yml up --build
```

View File

@ -1,30 +0,0 @@
---
version: '3.5'
services:
datahub-gms:
image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest}
build:
context: ../../
dockerfile: docker/gms/Dockerfile
hostname: datahub-gms
container_name: datahub-gms
ports:
- "8080:8080"
environment:
- EBEAN_DATASOURCE_USERNAME=datahub
- EBEAN_DATASOURCE_PASSWORD=datahub
- EBEAN_DATASOURCE_HOST=mariadb:3306
- EBEAN_DATASOURCE_URL=jdbc:mariadb://mariadb:3306/datahub
- EBEAN_DATASOURCE_DRIVER=org.mariadb.jdbc.Driver
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
- NEO4J_HOST=neo4j:7474
- NEO4J_URI=bolt://neo4j
- NEO4J_USERNAME=neo4j
- NEO4J_PASSWORD=datahub
networks:
default:
name: datahub_network

View File

@ -1,30 +0,0 @@
---
version: '3.5'
services:
datahub-gms:
image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest}
build:
context: ../../
dockerfile: docker/gms/Dockerfile
hostname: datahub-gms
container_name: datahub-gms
ports:
- "8080:8080"
environment:
- EBEAN_DATASOURCE_USERNAME=datahub
- EBEAN_DATASOURCE_PASSWORD=datahub
- EBEAN_DATASOURCE_HOST=postgres:5432
- EBEAN_DATASOURCE_URL=jdbc:postgresql://postgres:5432/datahub
- EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
- NEO4J_HOST=neo4j:7474
- NEO4J_URI=bolt://neo4j
- NEO4J_USERNAME=neo4j
- NEO4J_PASSWORD=datahub
networks:
default:
name: datahub_network

View File

@ -1,30 +0,0 @@
---
version: '3.5'
services:
datahub-gms:
image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest}
build:
context: ../../
dockerfile: docker/gms/Dockerfile
hostname: datahub-gms
container_name: datahub-gms
ports:
- "8080:8080"
environment:
- EBEAN_DATASOURCE_USERNAME=datahub
- EBEAN_DATASOURCE_PASSWORD=datahub
- EBEAN_DATASOURCE_HOST=mysql:3306
- EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true
- EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
- NEO4J_HOST=neo4j:7474
- NEO4J_URI=bolt://neo4j
- NEO4J_USERNAME=neo4j
- NEO4J_PASSWORD=datahub
networks:
default:
name: datahub_network

View File

@ -2,16 +2,3 @@
Refer to [DataHub Metadata Ingestion](../../metadata-ingestion/mce-cli) to have a quick understanding of the architecture and
responsibility of this service for the DataHub.
## Build & Run
```
cd docker/ingestion && docker-compose up --build
```
This command will rebuild the docker image and start a container based on the image.
To start a container using an existing image, run the same command without the `--build` flag.
### Container configuration
#### Prerequisite Containers
Before starting `ingestion` container, `kafka`, `datahub-gms`, `mysql` and `datahub-mce-consumer` containers should already be up and running.

View File

@ -0,0 +1,4 @@
KAFKA_REST_LISTENERS=http://0.0.0.0:8082/
KAFKA_REST_SCHEMA_REGISTRY_URL=http://schema-registry:8081/
KAFKA_REST_HOST_NAME=kafka-rest-proxy
KAFKA_REST_BOOTSTRAP_SERVERS=PLAINTEXT://broker:29092

View File

@ -0,0 +1,14 @@
# Kafka, Zookeeper and Schema Registry
DataHub uses Kafka as the pub-sub message queue in the backend.
[Official Confluent Kafka Docker images](https://hub.docker.com/u/confluentinc) found in Docker Hub is used without
any modification.
## Debugging Kafka
You can install [kafkacat](https://github.com/edenhill/kafkacat) to consume and produce messaged to Kafka topics.
For example, to consume messages on MetadataAuditEvent topic, you can run below command.
```
kafkacat -b localhost:9092 -t MetadataAuditEvent
```
However, `kafkacat` currently doesn't support Avro deserialization at this point,
but they have an ongoing [work](https://github.com/edenhill/kafkacat/pull/151) for that.

2
docker/kafka-setup/env/docker.env vendored Normal file
View File

@ -0,0 +1,2 @@
KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
KAFKA_BOOTSTRAP_SERVER=broker:29092

2
docker/kafka-topics-ui/env/docker.env vendored Normal file
View File

@ -0,0 +1,2 @@
KAFKA_REST_PROXY_URL="http://kafkarestproxy:8082/"
PROXY="true"

View File

@ -1,47 +0,0 @@
# Kafka, Zookeeper and Schema Registry
DataHub uses Kafka as the pub-sub message queue in the backend.
[Official Confluent Kafka Docker images](https://hub.docker.com/u/confluentinc) found in Docker Hub is used without
any modification.
## Run Docker container
Below command will start all Kafka related containers.
```
cd docker/kafka && docker-compose pull && docker-compose up
```
As part of `docker-compose`, we also initialize a container called `kafka-setup` to create `MetadataAuditEvent` and
`MetadataChangeEvent` & `FailedMetadataChangeEvent` topics. The only thing this container does is creating Kafka topics after Kafka broker is ready.
There is also a container which provides visual schema registry interface which you can register/unregister schemas.
You can connect to `schema-registry-ui` on your web browser to monitor Kafka Schema Registry via below link:
```
http://localhost:8000
```
## Container configuration
### External Port
If you need to configure default configurations for your container such as the exposed port, you will do that in
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
how to change your exposed port settings.
```
ports:
- "9092:9092"
```
### Docker Network
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
If you change this, you will need to change this for all other Docker containers as well.
```
networks:
default:
name: datahub_network
```
## Debugging Kafka
You can install [kafkacat](https://github.com/edenhill/kafkacat) to consume and produce messaged to Kafka topics.
For example, to consume messages on MetadataAuditEvent topic, you can run below command.
```
kafkacat -b localhost:9092 -t MetadataAuditEvent
```
However, `kafkacat` currently doesn't support Avro deserialization at this point,
but they have an ongoing [work](https://github.com/edenhill/kafkacat/pull/151) for that.

View File

@ -1,104 +0,0 @@
---
version: '3.5'
services:
zookeeper:
image: confluentinc/cp-zookeeper:5.4.0
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
broker:
image: confluentinc/cp-kafka:5.4.0
hostname: broker
container_name: broker
depends_on:
- zookeeper
ports:
- "29092:29092"
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
# This "container" is a workaround to pre-create topics
kafka-setup:
build:
context: .
hostname: kafka-setup
container_name: kafka-setup
depends_on:
- broker
- schema-registry
environment:
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_BOOTSTRAP_SERVER=broker:29092
kafka-rest-proxy:
image: confluentinc/cp-kafka-rest:5.4.0
hostname: kafka-rest-proxy
ports:
- "8082:8082"
environment:
KAFKA_REST_LISTENERS: http://0.0.0.0:8082/
KAFKA_REST_SCHEMA_REGISTRY_URL: http://schema-registry:8081/
KAFKA_REST_HOST_NAME: kafka-rest-proxy
KAFKA_REST_BOOTSTRAP_SERVERS: PLAINTEXT://broker:29092
depends_on:
- zookeeper
- broker
- schema-registry
kafka-topics-ui:
image: landoop/kafka-topics-ui:0.9.4
hostname: kafka-topics-ui
ports:
- "18000:8000"
environment:
KAFKA_REST_PROXY_URL: "http://kafka-rest-proxy:8082/"
PROXY: "true"
depends_on:
- zookeeper
- broker
- schema-registry
- kafka-rest-proxy
schema-registry:
image: confluentinc/cp-schema-registry:5.4.0
hostname: schema-registry
container_name: schema-registry
depends_on:
- zookeeper
- broker
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181'
schema-registry-ui:
image: landoop/schema-registry-ui:latest
container_name: schema-registry-ui
hostname: schema-registry-ui
ports:
- "8000:8000"
environment:
SCHEMAREGISTRY_URL: 'http://schema-registry:8081'
ALLOW_GLOBAL: 'true'
ALLOW_TRANSITIVE: 'true'
ALLOW_DELETION: 'true'
READONLY_MODE: 'true'
PROXY: 'true'
depends_on:
- schema-registry
networks:
default:
name: datahub_network

2
docker/kibana/env/docker.env vendored Normal file
View File

@ -0,0 +1,2 @@
SERVER_HOST=0.0.0.0
ELASTICSEARCH_URL=http://elasticsearch:9200

View File

@ -1,19 +0,0 @@
FROM openjdk:8 as builder
COPY . datahub-src
RUN cd datahub-src && ./gradlew :metadata-jobs:mae-consumer-job:build \
&& cp metadata-jobs/mae-consumer-job/build/libs/mae-consumer-job.jar ../mae-consumer-job.jar \
&& cd .. && rm -rf datahub-src
FROM openjdk:8-jre-alpine
ENV DOCKERIZE_VERSION v0.6.1
RUN apk --no-cache add curl tar \
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
COPY --from=builder /mae-consumer-job.jar /mae-consumer-job.jar
COPY docker/mae-consumer/start.sh /start.sh
RUN chmod +x /start.sh
EXPOSE 9091
CMD /start.sh

View File

@ -1,42 +0,0 @@
# DataHub MetadataAuditEvent (MAE) Consumer Docker Image
[![datahub-mae-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mae-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mae-consumer+docker%22)
Refer to [DataHub MAE Consumer Job](../../metadata-jobs/mae-consumer-job) to have a quick understanding of the architecture and
responsibility of this service for the DataHub.
## Build & Run
```
cd docker/mae-consumer && docker-compose up --build
```
This command will rebuild the docker image and start a container based on the image.
To start a container using a previously built image, run the same command without the `--build` flag.
### Container configuration
#### Docker Network
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
If you change this, you will need to change this for all other Docker containers as well.
```
networks:
default:
name: datahub_network
```
#### Elasticsearch and Kafka Containers
Before starting `datahub-mae-consumer` container, `elasticsearch` and `kafka` containers should already be up and running.
These connections are configured via environment variables in `docker-compose.yml`:
```
environment:
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
```
The value of `KAFKA_BOOTSTRAP_SERVER` variable should be set to the host name of the `kafka broker` container within the Docker network.
The value of `KAFKA_SCHEMAREGISTRY_URL` variable should be set to the host name of the `kafka schema registry` container within the Docker network.
```
environment:
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
```
The value of `ELASTICSEARCH_HOST` variable should be set to the host name of the `elasticsearch` container within the Docker network.

View File

@ -1,25 +0,0 @@
---
version: '3.5'
services:
datahub-mae-consumer:
image: linkedin/datahub-mae-consumer:${DATAHUB_VERSION:-latest}
build:
context: ../../
dockerfile: docker/mae-consumer/Dockerfile
hostname: datahub-mae-consumer
container_name: datahub-mae-consumer
ports:
- "9091:9091"
environment:
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
- NEO4J_HOST=neo4j:7474
- NEO4J_URI=bolt://neo4j
- NEO4J_USERNAME=neo4j
- NEO4J_PASSWORD=datahub
networks:
default:
name: datahub_network

View File

@ -4,36 +4,3 @@ DataHub GMS can use MariaDB as an alternate storage backend.
[Official MariaDB Docker image](https://hub.docker.com/_/mariadb) found in Docker Hub is used without
any modification.
## Run Docker container
Below command will start the MariaDB container.
```
cd docker/mariadb && docker-compose pull && docker-compose up
```
An initialization script [init.sql](init.sql) is provided to container. This script initializes `metadata-aspect` table
which is basically the Key-Value store of the DataHub GMS.
To connect to MariaDB container, you can type below command:
```
docker exec -it mariadb mysql -u datahub -pdatahub datahub
```
## Container configuration
### External Port
If you need to configure default configurations for your container such as the exposed port, you will do that in
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
how to change your exposed port settings.
```
ports:
- '3306:3306'
```
### Docker Network
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
If you change this, you will need to change this for all other Docker containers as well.
```
networks:
default:
name: datahub_network
```

View File

@ -1,21 +1,23 @@
# Override to use MariaDB as a backing store for datahub-gms.
---
version: '3.5'
version: '3.8'
services:
mysql:
mariadb:
container_name: mariadb
hostname: mariadb
image: mariadb:10.5
env_file: env/docker.env
restart: always
environment:
MYSQL_DATABASE: 'datahub'
MYSQL_USER: 'datahub'
MYSQL_PASSWORD: 'datahub'
MYSQL_ROOT_PASSWORD: 'datahub'
ports:
- '3306:3306'
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
datahub-gms:
env_file: ../datahub-gms/env/dev.mariadb.env
depends_on:
- mariadb
networks:
default:
name: datahub_network

View File

@ -1,19 +0,0 @@
FROM openjdk:8 as builder
COPY . datahub-src
RUN cd datahub-src && ./gradlew :metadata-jobs:mce-consumer-job:build \
&& cp metadata-jobs/mce-consumer-job/build/libs/mce-consumer-job.jar ../mce-consumer-job.jar \
&& cd .. && rm -rf datahub-src
FROM openjdk:8-jre-alpine
ENV DOCKERIZE_VERSION v0.6.1
RUN apk --no-cache add curl tar \
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
COPY --from=builder /mce-consumer-job.jar /mce-consumer-job.jar
COPY docker/mce-consumer/start.sh /start.sh
RUN chmod +x /start.sh
EXPOSE 9090
CMD /start.sh

View File

@ -1,42 +0,0 @@
# DataHub MetadataChangeEvent (MCE) Consumer Docker Image
[![datahub-mce-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mce-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mce-consumer+docker%22)
Refer to [DataHub MCE Consumer Job](../../metadata-jobs/mce-consumer-job) to have a quick understanding of the architecture and
responsibility of this service for the DataHub.
## Build & Run
```
cd docker/mce-consumer && docker-compose up --build
```
This command will rebuild the docker image and start a container based on the image.
To start a container using a previously built image, run the same command without the `--build` flag.
### Container configuration
#### Docker Network
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
If you change this, you will need to change this for all other Docker containers as well.
```
networks:
default:
name: datahub_network
```
#### Kafka and DataHub GMS Containers
Before starting `datahub-mce-consumer` container, `datahub-gms` and `kafka` containers should already be up and running.
These connections are configured via environment variables in `docker-compose.yml`:
```
environment:
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
```
The value of `KAFKA_BOOTSTRAP_SERVER` variable should be set to the host name of the `kafka broker` container within the Docker network.
The value of `KAFKA_SCHEMAREGISTRY_URL` variable should be set to the host name of the `kafka schema registry` container within the Docker network.
```
environment:
- GMS_HOST=datahub-gms
- GMS_PORT=8080
```
The value of `GMS_HOST` variable should be set to the host name of the `datahub-gms` container within the Docker network.

View File

@ -1,23 +0,0 @@
---
version: '3.5'
services:
datahub-mce-consumer:
image: linkedin/datahub-mce-consumer:${DATAHUB_VERSION:-latest}
build:
context: ../../
dockerfile: docker/mce-consumer/Dockerfile
hostname: datahub-mce-consumer
container_name: datahub-mce-consumer
ports:
- "9090:9090"
environment:
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
- GMS_HOST=datahub-gms
- GMS_PORT=8080
- KAFKA_MCE_TOPIC_NAME=MetadataChangeEvent
- KAFKA_FMCE_TOPIC_NAME=FailedMetadataChangeEvent
networks:
default:
name: datahub_network

View File

@ -4,36 +4,3 @@ DataHub GMS uses MySQL as the storage backend.
[Official MySQL Docker image](https://hub.docker.com/_/mysql) found in Docker Hub is used without
any modification.
## Run Docker container
Below command will start the MySQL container.
```
cd docker/mysql && docker-compose pull && docker-compose up
```
An initialization script [init.sql](init.sql) is provided to container. This script initializes `metadata-aspect` table
which is basically the Key-Value store of the DataHub GMS.
To connect to MySQL container, you can type below command:
```
docker exec -it mysql mysql -u datahub -pdatahub datahub
```
## Container configuration
### External Port
If you need to configure default configurations for your container such as the exposed port, you will do that in
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
how to change your exposed port settings.
```
ports:
- '3306:3306'
```
### Docker Network
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
If you change this, you will need to change this for all other Docker containers as well.
```
networks:
default:
name: datahub_network
```

View File

@ -0,0 +1,24 @@
# Override to use MySQL as a backing store for datahub-gms.
---
version: '3.8'
services:
mysql:
container_name: mysql
hostname: mysql
image: mysql:5.7
env_file: env/docker.env
restart: always
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
ports:
- "3306:3306"
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
- mysqldata:/var/lib/mysql
datahub-gms:
env_file: ../datahub-gms/env/docker.env
depends_on:
- mysql
volumes:
mysqldata:

View File

@ -1,21 +0,0 @@
---
version: '3.5'
services:
mysql:
container_name: mysql
hostname: mysql
image: mysql:5.7
restart: always
environment:
MYSQL_DATABASE: 'datahub'
MYSQL_USER: 'datahub'
MYSQL_PASSWORD: 'datahub'
MYSQL_ROOT_PASSWORD: 'datahub'
ports:
- '3306:3306'
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
networks:
default:
name: datahub_network

4
docker/mysql/env/docker.env vendored Normal file
View File

@ -0,0 +1,4 @@
MYSQL_DATABASE=datahub
MYSQL_USER=datahub
MYSQL_PASSWORD=datahub
MYSQL_ROOT_PASSWORD=datahub

View File

@ -4,32 +4,6 @@ DataHub uses Neo4j as graph db in the backend to serve graph queries.
[Official Neo4j image](https://hub.docker.com/_/neo4j) found in Docker Hub is used without
any modification.
## Run Docker container
Below command will start all Neo4j container.
```
cd docker/neo4j && docker-compose pull && docker-compose up
```
## Container configuration
### External Port
If you need to configure default configurations for your container such as the exposed port, you will do that in
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
how to change your exposed port settings.
```
ports:
- "7474:7474"
- "7687:7687"
```
### Docker Network
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
If you change this, you will need to change it for all other Docker containers as well.
```
networks:
default:
name: datahub_network
```
## Neo4j Browser
To be able to debug and run Cypher queries against your Neo4j image, you can open up `Neo4j Browser` which is running at
[http://localhost:7474/browser/](http://localhost:7474/browser/). Default username is `neo4j` and password is `datahub`.

View File

@ -1,16 +0,0 @@
---
version: '3.5'
services:
neo4j:
image: neo4j:3.5.7
hostname: neo4j
container_name: neo4j
environment:
NEO4J_AUTH: 'neo4j/datahub'
ports:
- "7474:7474"
- "7687:7687"
networks:
default:
name: datahub_network

1
docker/neo4j/env/docker.env vendored Normal file
View File

@ -0,0 +1 @@
NEO4J_AUTH=neo4j/datahub

View File

@ -0,0 +1,6 @@
# MySQL
DataHub GMS can use PostgreSQL as an alternate storage backend.
[Official PostgreSQL Docker image](https://hub.docker.com/_/postgres) found in Docker Hub is used without
any modification.

View File

@ -1,19 +1,23 @@
# Override to use PostgreSQL as a backing store for datahub-gms.
---
version: '3.5'
version: '3.8'
services:
postgres:
container_name: postgres
hostname: postgres
image: postgres:12.3
env_file: env/docker.env
restart: always
environment:
POSTGRES_USER: datahub
POSTGRES_PASSWORD: datahub
ports:
- '5432:5432'
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
datahub-gms:
env_file: ../datahub-gms/env/dev.postgres.env
depends_on:
- postgres
networks:
default:
name: datahub_network

2
docker/postgres/env/docker.env vendored Normal file
View File

@ -0,0 +1,2 @@
POSTGRES_USER: datahub
POSTGRES_PASSWORD: datahub

View File

@ -1,39 +0,0 @@
# MySQL
DataHub GMS can use PostgreSQL as an alternate storage backend.
[Official PostgreSQL Docker image](https://hub.docker.com/_/postgres) found in Docker Hub is used without
any modification.
## Run Docker container
Below command will start the MySQL container.
```
cd docker/postgres && docker-compose pull && docker-compose up
```
An initialization script [init.sql](init.sql) is provided to container. This script initializes `metadata-aspect` table
which is basically the Key-Value store of the DataHub GMS.
To connect to PostgreSQL container, you can type below command:
```
docker exec -it postgres psql -U datahub
```
## Container configuration
### External Port
If you need to configure default configurations for your container such as the exposed port, you will do that in
`docker-compose.yml` file. Refer to this [link](https://docs.docker.com/compose/compose-file/#ports) to understand
how to change your exposed port settings.
```
ports:
- '5432:5432'
```
### Docker Network
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
If you change this, you will need to change this for all other Docker containers as well.
```
networks:
default:
name: datahub_network
```

7
docker/quickstart.sh Executable file
View File

@ -0,0 +1,7 @@
#!/bin/bash
# Quickstarts DataHub by pullinng all images from dockerhub and then running the containers locally. No images are
# built locally. Note: by default this pulls the latest version; you can change this to a specific version by setting
# the DATAHUB_VERSION environment variable.
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd $DIR && docker-compose pull && docker-compose -p datahub up

View File

@ -1,30 +0,0 @@
# DataHub Quickstart
To start all Docker containers at once, please run below command from project root directory:
```bash
./docker/quickstart/quickstart.sh
```
At this point, all containers are ready and DataHub can be considered up and running. Check specific containers guide
for details:
* [Elasticsearch & Kibana](../elasticsearch)
* [DataHub Frontend](../frontend)
* [DataHub GMS](../gms)
* [Kafka, Schema Registry & Zookeeper](../kafka)
* [DataHub MAE Consumer](../mae-consumer)
* [DataHub MCE Consumer](../mce-consumer)
* [MySQL](../mysql)
From this point on, if you want to be able to sign in to DataHub and see some sample data, please see
[Metadata Ingestion Guide](../../metadata-ingestion) for `bootstrapping DataHub`.
You can also choose to use a specific versin of DataHub docker images instead of the `latest` by specifying `DATAHUB_VERSION` environment variable.
## Debugging Containers
If you want to debug containers, you can check container logs:
```
docker logs <<container_name>>
```
Also, you can connect to container shell for further debugging:
```
docker exec -it <<container_name>> bash
```

View File

@ -1,260 +0,0 @@
---
version: '3.5'
services:
mysql:
container_name: mysql
hostname: mysql
image: mysql:5.7
restart: always
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
environment:
MYSQL_DATABASE: 'datahub'
MYSQL_USER: 'datahub'
MYSQL_PASSWORD: 'datahub'
MYSQL_ROOT_PASSWORD: 'datahub'
ports:
- "3306:3306"
volumes:
- ../mysql/init.sql:/docker-entrypoint-initdb.d/init.sql
- mysqldata:/var/lib/mysql
zookeeper:
image: confluentinc/cp-zookeeper:5.4.0
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
volumes:
- zkdata:/var/opt/zookeeper
broker:
image: confluentinc/cp-kafka:5.4.0
hostname: broker
container_name: broker
depends_on:
- zookeeper
ports:
- "29092:29092"
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
kafka-rest-proxy:
image: confluentinc/cp-kafka-rest:5.4.0
hostname: kafka-rest-proxy
container_name: kafka-rest-proxy
ports:
- "8082:8082"
environment:
KAFKA_REST_LISTENERS: http://0.0.0.0:8082/
KAFKA_REST_SCHEMA_REGISTRY_URL: http://schema-registry:8081/
KAFKA_REST_HOST_NAME: kafka-rest-proxy
KAFKA_REST_BOOTSTRAP_SERVERS: PLAINTEXT://broker:29092
depends_on:
- zookeeper
- broker
- schema-registry
kafka-topics-ui:
image: landoop/kafka-topics-ui:0.9.4
hostname: kafka-topics-ui
container_name: kafka-topics-ui
ports:
- "18000:8000"
environment:
KAFKA_REST_PROXY_URL: "http://kafka-rest-proxy:8082/"
PROXY: "true"
depends_on:
- zookeeper
- broker
- schema-registry
- kafka-rest-proxy
# This "container" is a workaround to pre-create topics
kafka-setup:
build:
context: ../kafka
hostname: kafka-setup
container_name: kafka-setup
depends_on:
- broker
- schema-registry
environment:
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_BOOTSTRAP_SERVER=broker:29092
schema-registry:
image: confluentinc/cp-schema-registry:5.4.0
hostname: schema-registry
container_name: schema-registry
depends_on:
- zookeeper
- broker
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181'
schema-registry-ui:
image: landoop/schema-registry-ui:latest
container_name: schema-registry-ui
hostname: schema-registry-ui
ports:
- "8000:8000"
environment:
SCHEMAREGISTRY_URL: 'http://schema-registry:8081'
ALLOW_GLOBAL: 'true'
ALLOW_TRANSITIVE: 'true'
ALLOW_DELETION: 'true'
READONLY_MODE: 'true'
PROXY: 'true'
depends_on:
- schema-registry
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8
container_name: elasticsearch
hostname: elasticsearch
ports:
- "9200:9200"
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
volumes:
- esdata:/usr/share/elasticsearch/data
kibana:
image: docker.elastic.co/kibana/kibana:5.6.8
container_name: kibana
hostname: kibana
ports:
- "5601:5601"
environment:
- SERVER_HOST=0.0.0.0
- ELASTICSEARCH_URL=http://elasticsearch:9200
depends_on:
- elasticsearch
neo4j:
image: neo4j:3.5.7
hostname: neo4j
container_name: neo4j
environment:
NEO4J_AUTH: 'neo4j/datahub'
ports:
- "7474:7474"
- "7687:7687"
volumes:
- neo4jdata:/data
# This "container" is a workaround to pre-create search indices
elasticsearch-setup:
build:
context: ../elasticsearch
hostname: elasticsearch-setup
container_name: elasticsearch-setup
depends_on:
- elasticsearch
environment:
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
datahub-gms:
image: linkedin/datahub-gms:${DATAHUB_VERSION:-latest}
hostname: datahub-gms
container_name: datahub-gms
ports:
- "8080:8080"
environment:
- EBEAN_DATASOURCE_USERNAME=datahub
- EBEAN_DATASOURCE_PASSWORD=datahub
- EBEAN_DATASOURCE_HOST=mysql:3306
- EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8
- EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
- NEO4J_HOST=neo4j:7474
- NEO4J_URI=bolt://neo4j
- NEO4J_USERNAME=neo4j
- NEO4J_PASSWORD=datahub
depends_on:
- elasticsearch-setup
- kafka-setup
- mysql
- neo4j
datahub-frontend:
image: linkedin/datahub-frontend:${DATAHUB_VERSION:-latest}
hostname: datahub-frontend
container_name: datahub-frontend
ports:
- "9001:9001"
environment:
- DATAHUB_GMS_HOST=datahub-gms
- DATAHUB_GMS_PORT=8080
- DATAHUB_SECRET=YouKnowNothing
- DATAHUB_APP_VERSION=1.0
- DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB
depends_on:
- datahub-gms
datahub-mae-consumer:
image: linkedin/datahub-mae-consumer:${DATAHUB_VERSION:-latest}
hostname: datahub-mae-consumer
container_name: datahub-mae-consumer
ports:
- "9091:9091"
environment:
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
- NEO4J_HOST=neo4j:7474
- NEO4J_URI=bolt://neo4j
- NEO4J_USERNAME=neo4j
- NEO4J_PASSWORD=datahub
depends_on:
- kafka-setup
- elasticsearch-setup
- neo4j
command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \
echo kafka-setup done! && /start.sh'"
datahub-mce-consumer:
image: linkedin/datahub-mce-consumer:${DATAHUB_VERSION:-latest}
hostname: datahub-mce-consumer
container_name: datahub-mce-consumer
ports:
- "9090:9090"
environment:
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
- GMS_HOST=datahub-gms
- GMS_PORT=8080
depends_on:
- kafka-setup
- datahub-gms
command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \
echo kafka-setup done! && /start.sh'"
networks:
default:
name: datahub_network
volumes:
mysqldata:
esdata:
neo4jdata:
zkdata:

View File

@ -1,4 +0,0 @@
#!/bin/bash
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd $DIR && docker-compose pull && docker-compose -p datahub up --build

View File

@ -1,268 +0,0 @@
---
version: '3.5'
services:
mysql:
container_name: mysql
hostname: mysql
image: mysql:5.7
restart: always
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
environment:
MYSQL_DATABASE: 'datahub'
MYSQL_USER: 'datahub'
MYSQL_PASSWORD: 'datahub'
MYSQL_ROOT_PASSWORD: 'datahub'
ports:
- "3306:3306"
volumes:
- ../mysql/init.sql:/docker-entrypoint-initdb.d/init.sql
- mysqldata:/var/lib/mysql
zookeeper:
image: confluentinc/cp-zookeeper:5.4.0
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
volumes:
- zkdata:/var/opt/zookeeper
broker:
image: confluentinc/cp-kafka:5.4.0
hostname: broker
container_name: broker
depends_on:
- zookeeper
ports:
- "29092:29092"
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
kafka-rest-proxy:
image: confluentinc/cp-kafka-rest:5.4.0
hostname: kafka-rest-proxy
container_name: kafka-rest-proxy
ports:
- "8082:8082"
environment:
KAFKA_REST_LISTENERS: http://0.0.0.0:8082/
KAFKA_REST_SCHEMA_REGISTRY_URL: http://schema-registry:8081/
KAFKA_REST_HOST_NAME: kafka-rest-proxy
KAFKA_REST_BOOTSTRAP_SERVERS: PLAINTEXT://broker:29092
depends_on:
- zookeeper
- broker
- schema-registry
kafka-topics-ui:
image: landoop/kafka-topics-ui:0.9.4
hostname: kafka-topics-ui
container_name: kafka-topics-ui
ports:
- "18000:8000"
environment:
KAFKA_REST_PROXY_URL: "http://kafka-rest-proxy:8082/"
PROXY: "true"
depends_on:
- zookeeper
- broker
- schema-registry
- kafka-rest-proxy
# This "container" is a workaround to pre-create topics
kafka-setup:
build:
context: ../kafka
hostname: kafka-setup
container_name: kafka-setup
depends_on:
- broker
- schema-registry
environment:
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_BOOTSTRAP_SERVER=broker:29092
schema-registry:
image: confluentinc/cp-schema-registry:5.4.0
hostname: schema-registry
container_name: schema-registry
depends_on:
- zookeeper
- broker
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181'
schema-registry-ui:
image: landoop/schema-registry-ui:latest
container_name: schema-registry-ui
hostname: schema-registry-ui
ports:
- "8000:8000"
environment:
SCHEMAREGISTRY_URL: 'http://schema-registry:8081'
ALLOW_GLOBAL: 'true'
ALLOW_TRANSITIVE: 'true'
ALLOW_DELETION: 'true'
READONLY_MODE: 'true'
PROXY: 'true'
depends_on:
- schema-registry
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:5.6.8
container_name: elasticsearch
hostname: elasticsearch
ports:
- "9200:9200"
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
volumes:
- esdata:/usr/share/elasticsearch/data
kibana:
image: docker.elastic.co/kibana/kibana:5.6.8
container_name: kibana
hostname: kibana
ports:
- "5601:5601"
environment:
- SERVER_HOST=0.0.0.0
- ELASTICSEARCH_URL=http://elasticsearch:9200
depends_on:
- elasticsearch
neo4j:
image: neo4j:3.5.7
hostname: neo4j
container_name: neo4j
environment:
NEO4J_AUTH: 'neo4j/datahub'
ports:
- "7474:7474"
- "7687:7687"
volumes:
- neo4jdata:/data
# This "container" is a workaround to pre-create search indices
elasticsearch-setup:
build:
context: ../elasticsearch
hostname: elasticsearch-setup
container_name: elasticsearch-setup
depends_on:
- elasticsearch
environment:
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
datahub-gms:
build:
context: ../../
dockerfile: docker/gms/Dockerfile
hostname: datahub-gms
container_name: datahub-gms
ports:
- "8080:8080"
environment:
- EBEAN_DATASOURCE_USERNAME=datahub
- EBEAN_DATASOURCE_PASSWORD=datahub
- EBEAN_DATASOURCE_HOST=mysql:3306
- EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8
- EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
- NEO4J_HOST=neo4j:7474
- NEO4J_URI=bolt://neo4j
- NEO4J_USERNAME=neo4j
- NEO4J_PASSWORD=datahub
depends_on:
- elasticsearch-setup
- kafka-setup
- mysql
- neo4j
datahub-frontend:
build:
context: ../../
dockerfile: docker/frontend/Dockerfile
hostname: datahub-frontend
container_name: datahub-frontend
ports:
- "9001:9001"
environment:
- DATAHUB_GMS_HOST=datahub-gms
- DATAHUB_GMS_PORT=8080
- DATAHUB_SECRET=YouKnowNothing
- DATAHUB_APP_VERSION=1.0
- DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB
depends_on:
- datahub-gms
datahub-mae-consumer:
build:
context: ../../
dockerfile: docker/mae-consumer/Dockerfile
hostname: datahub-mae-consumer
container_name: datahub-mae-consumer
ports:
- "9091:9091"
environment:
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
- NEO4J_HOST=neo4j:7474
- NEO4J_URI=bolt://neo4j
- NEO4J_USERNAME=neo4j
- NEO4J_PASSWORD=datahub
depends_on:
- kafka-setup
- elasticsearch-setup
- neo4j
command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \
echo kafka-setup done! && /start.sh'"
datahub-mce-consumer:
build:
context: ../../
dockerfile: docker/mce-consumer/Dockerfile
hostname: datahub-mce-consumer
container_name: datahub-mce-consumer
ports:
- "9090:9090"
environment:
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
- GMS_HOST=datahub-gms
- GMS_PORT=8080
depends_on:
- kafka-setup
- datahub-gms
command: "sh -c 'while ping -c1 kafka-setup &>/dev/null; do echo waiting for kafka-setup... && sleep 1; done; \
echo kafka-setup done! && /start.sh'"
networks:
default:
name: datahub_network
volumes:
mysqldata:
esdata:
neo4jdata:
zkdata:

View File

@ -1,4 +0,0 @@
#!/bin/bash
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd $DIR && docker-compose pull && docker-compose -p datahub up

View File

@ -0,0 +1,6 @@
SCHEMAREGISTRY_URL=http://schema-registry:8081
ALLOW_GLOBAL=true
ALLOW_TRANSITIVE=true
ALLOW_DELETION=true
READONLY_MODE=true
PROXY=true

2
docker/schema-registry/env/docker.env vendored Normal file
View File

@ -0,0 +1,2 @@
SCHEMA_REGISTRY_HOST_NAME=schemaregistry
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=zookeeper:2181

2
docker/zookeeper/env/docker.env vendored Normal file
View File

@ -0,0 +1,2 @@
ZOOKEEPER_CLIENT_PORT=2181
ZOOKEEPER_TICK_TIME=2000

1
docs/docker/README.md Normal file
View File

@ -0,0 +1 @@
See [docker/README.md](../../docker/README.md).

View File

@ -0,0 +1,70 @@
# Using Docker Images During Development
We've created a special `docker-compose.dev.yml` override file that should configure docker images to be easier to use
during development.
Normally, you'd rebuild your images from scratch with `docker-compose build` (or `docker-compose up --build`). However,
this takes way too long for development. It has to copy the entire repo to each image and rebuild it there.
The `docker-compose.dev.yml` file bypasses this problem by mounting binaries, startup scripts, and other data to
special, slimmed down images (of which the Dockerfile is usually defined in `<service>/debug/Dockerfile`). Mounts work
both ways, so they should also try to mount log directories on the container, so that they are easy to read on your
local machine without needing to inspect the running container (especially if the app crashes and the container stops!).
We highly recommend you just invoke the `docker/dev.sh` script we've included. It is pretty small if you want to read it
to see what it does, but it ends up using our `docker-compose.dev.yml` file.
## Debugging
The default dev images, while set up to use your local code, do not enable debugging by default. To enable debugging,
you need to make two small edits (don't check these changes in!).
- Add the JVM debug flags to the environment file for the service.
- Assign the port in the docker-compose file.
For example, to debug `dathaub-gms`:
```
# Add this line to docker/datahub-gms/env/dev.env. You can change the port and/or change suspend=n to y.
JAVA_TOOL_OPTIONS -agentlib:jdwp=transport=dt_socket,address=5005,server=y,suspend=n
```
```
# Change the definition in docker/docker-compose.dev.yml to this
datahub-gms:
image: linkedin/datahub-gms:debug
build:
context: datahub-gms/debug
dockerfile: Dockerfile
ports: # <--- Add this line
- "5005:5005" # <--- And this line. Must match port from environment file.
volumes:
- ./datahub-gms/start.sh:/datahub/datahub-gms/scripts/start.sh
- ../gms/war/build/libs/:/datahub/datahub-gms/bin
```
## Tips for People New To Docker
### Conflicting containers
If you ran `docker/quickstart.sh` before, your machine may already have a container for DataHub. If you want to run
`docker/dev.sh` instead, ensure that the old container is removed by running `docker container prune`. The opposite also
applies.
> Note this only removes containers, not images. Should still be fast to switch between these once you've launched both
> at least once.
### Running a specific service
`docker-compose up` will launch all services in the configuration, including dependencies, unless they're already
running. If you, for some reason, wish to change this behavior, check out these example commands.
```
docker-compose -p datahub -f docker-compose.yml -f docker-compose.overrides.yml -f docker-compose.dev.yml up datahub-gms
```
Will only start `datahub-gms` and its dependencies.
```
docker-compose -p datahub -f docker-compose.yml -f docker-compose.overrides.yml -f docker-compose.dev.yml up --no-deps datahub-gms
```
Will only start `datahub-gms`, without dependencies.