
* Make docker files easier to use during development. During development it quite nice to have docker work with locally built code. This allows you to launch all services very quickly, with your changes, and optionally with debugging support. Changes made to docker files: - Removed all redundant docker-compose files. We now have 1 giant file, and smaller files to use as overrides. - Remove redundant README files that provided little information. - Rename docker/<dir> to match the service name in the docker-compose file for clarity. - Move environment variables to .env files. We only provide dev / the default environment for quickstart. - Add debug options to docker files using multistage build to build minimal images with the idea that built files will be mounted instead. - Add a docker/dev.sh script + compose file to easily use the dev override images (separate tag; images never published; uses debug docker files; mounts binaries to image). - Added docs/docker documentation for this.
Docker Images
Prerequisites
You need to install docker and docker-compose (if using Linux; on Windows and Mac compose is included with Docker Desktop).
Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap area.
Quickstart
The easiest way to bring up and test DataHub is using DataHub Docker images which are continuously deployed to Docker Hub with every commit to repository.
You can easily download and run all these images and their dependencies with our quick start guide.
DataHub Docker Images:
- linkedin/datahub-gms
- linkedin/datahub-frontend
- linkedin/datahub-mae-consumer
- linkedin/datahub-mce-consumer
Dependencies:
Ingesting demo data.
If you want to test ingesting some data once DataHub is up, see Ingestion.
Using Docker Images During Development
See Using Docker Images During Development.
Building And Deploying Docker Images
We use GitHub actions to build and continuously deploy our images. There should be no need to do this manually; a successful release on Github will automatically publish the images.
Building images
To build the full images (that we are going to publish), you need to run the following:
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub build
This is because we're relying on builtkit for multistage builds. It does not hurt also set DATAHUB_VERSION
to
something unique.
This is not our recommended development flow and most developers should be following the Using Docker Images During Development guide.