Mars Lan 4f221f9a12
build(docker): refactor docker build scripts (#1687)
* build(docker): refactor docker build scripts

- add "build" option to docker-compose files to simplify rebuilding of images
- create "start.sh" script so it's easier to override "command" in the quickstart's docker-compose file
- use dockerize to wait for requisite services to start up
- add a dedicated Dockerfile for kafka-setup

This fixes https://github.com/linkedin/datahub/issues/1549 & https://github.com/linkedin/datahub/issues/1550
2020-06-08 13:37:14 -07:00
..

DataHub Generalized Metadata Store (GMS) Docker Image

datahub-gms docker

Refer to DataHub GMS Service to have a quick understanding of the architecture and responsibility of this service for the DataHub.

Build & Run

cd docker/gms && docker-compose up --build

This command will rebuild the local docker image and start a container based on the image.

To start a container using an existing image, run the same command without the --build flag.

Container configuration

External Port

If you need to configure default configurations for your container such as the exposed port, you will do that in docker-compose.yml file. Refer to this link to understand how to change your exposed port settings.

ports:
  - "8080:8080"

Docker Network

All Docker containers for DataHub are supposed to be on the same Docker network which is datahub_network. If you change this, you will need to change this for all other Docker containers as well.

networks:
  default:
    name: datahub_network

MySQL, Elasticsearch and Kafka Containers

Before starting datahub-gms container, mysql, elasticsearch, neo4j and kafka containers should already be up and running. These connections are configured via environment variables in docker-compose.yml:

environment:
  - EBEAN_DATASOURCE_USERNAME=datahub
  - EBEAN_DATASOURCE_PASSWORD=datahub
  - EBEAN_DATASOURCE_HOST=mysql:3306
  - EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub
  - EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver

The value of EBEAN_DATASOURCE_HOST variable should be set to the host name of the mysql container within the Docker network.

environment:
  - KAFKA_BOOTSTRAP_SERVER=broker:29092
  - KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081

The value of KAFKA_BOOTSTRAP_SERVER variable should be set to the host name of the kafka broker container within the Docker network. The value of KAFKA_SCHEMAREGISTRY_URL variable should be set to the host name of the kafka schema registry container within the Docker network.

environment:
  - ELASTICSEARCH_HOST=elasticsearch
  - ELASTICSEARCH_PORT=9200

The value of ELASTICSEARCH_HOST variable should be set to the host name of the elasticsearch container within the Docker network.

environment:
  - NEO4J_HOST=neo4j:7474
  - NEO4J_URI=bolt://neo4j
  - NEO4J_USERNAME=neo4j
  - NEO4J_PASSWORD=datahub

The value of NEO4J_URI variable should be set to the host name of the neo4j container within the Docker network.