2020-07-31 18:48:18 -07:00
|
|
|
# DataHub Quickstart Guide
|
|
|
|
|
2021-06-14 17:15:24 -07:00
|
|
|
## Deploying DataHub
|
|
|
|
|
2021-07-13 14:56:47 -07:00
|
|
|
To deploy a new instance of DataHub, perform the following steps.
|
2021-06-14 17:15:24 -07:00
|
|
|
|
2021-10-19 12:14:21 -07:00
|
|
|
1. Install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/) (if
|
|
|
|
using Linux). Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs,
|
|
|
|
8GB RAM, 2GB Swap area, and 10GB disk space.
|
2021-06-14 17:15:24 -07:00
|
|
|
|
2021-07-13 14:56:47 -07:00
|
|
|
2. Launch the Docker Engine from command line or the desktop app.
|
2021-06-14 17:15:24 -07:00
|
|
|
|
|
|
|
3. Install the DataHub CLI
|
2021-07-13 14:56:47 -07:00
|
|
|
|
2021-06-14 17:15:24 -07:00
|
|
|
a. Ensure you have Python 3.6+ installed & configured. (Check using `python3 --version`)
|
2021-07-13 14:56:47 -07:00
|
|
|
|
|
|
|
b. Run the following commands in your terminal
|
|
|
|
|
2021-06-14 17:15:24 -07:00
|
|
|
```
|
|
|
|
python3 -m pip install --upgrade pip wheel setuptools
|
|
|
|
python3 -m pip uninstall datahub acryl-datahub || true # sanity check - ok if it fails
|
|
|
|
python3 -m pip install --upgrade acryl-datahub
|
|
|
|
datahub version
|
|
|
|
```
|
|
|
|
|
2021-10-19 12:14:21 -07:00
|
|
|
If you see "command not found", try running cli commands with the prefix 'python3 -m'
|
|
|
|
instead: `python3 -m datahub version`
|
2021-06-14 17:15:24 -07:00
|
|
|
|
|
|
|
4. To deploy DataHub, run the following CLI command from your terminal
|
2021-07-13 14:56:47 -07:00
|
|
|
|
2021-06-14 17:15:24 -07:00
|
|
|
```
|
2021-07-13 14:56:47 -07:00
|
|
|
datahub docker quickstart
|
2021-06-14 17:15:24 -07:00
|
|
|
```
|
|
|
|
|
2021-10-19 12:14:21 -07:00
|
|
|
Upon completion of this step, you should be able to navigate to the DataHub UI
|
|
|
|
at [http://localhost:9002](http://localhost:9002) in your browser. You can sign in using `datahub` as both the
|
|
|
|
username and password.
|
2021-06-14 17:15:24 -07:00
|
|
|
|
|
|
|
5. To ingest the sample metadata, run the following CLI command from your terminal
|
|
|
|
```
|
|
|
|
datahub docker ingest-sample-data
|
|
|
|
```
|
|
|
|
|
2021-10-19 12:14:21 -07:00
|
|
|
That's it! To start pushing your company's metadata into DataHub, take a look at
|
|
|
|
the [Metadata Ingestion Framework](../metadata-ingestion/README.md).
|
2021-06-14 17:15:24 -07:00
|
|
|
|
|
|
|
## Resetting DataHub
|
|
|
|
|
2021-07-13 14:56:47 -07:00
|
|
|
To cleanse DataHub of all of it's state (e.g. before ingesting your own), you can use the CLI `nuke` command.
|
2021-06-14 17:15:24 -07:00
|
|
|
|
|
|
|
```
|
|
|
|
datahub docker nuke
|
|
|
|
```
|
|
|
|
|
2021-06-29 10:30:16 -07:00
|
|
|
## Troubleshooting
|
2021-06-14 17:15:24 -07:00
|
|
|
|
|
|
|
### Command not found: datahub
|
|
|
|
|
2021-10-19 12:14:21 -07:00
|
|
|
If running the datahub cli produces "command not found" errors inside your terminal, your system may be defaulting to an
|
|
|
|
older version of Python. Try prefixing your `datahub` commands with `python3 -m`:
|
2021-07-13 14:56:47 -07:00
|
|
|
|
2021-06-14 17:15:24 -07:00
|
|
|
```
|
|
|
|
python3 -m datahub docker quickstart
|
2021-06-29 10:30:16 -07:00
|
|
|
```
|
|
|
|
|
|
|
|
### Miscellaneous Docker issues
|
|
|
|
|
2021-07-13 14:56:47 -07:00
|
|
|
There can be misc issues with Docker, like conflicting containers and dangling volumes, that can often be resolved by
|
2021-10-19 12:14:21 -07:00
|
|
|
pruning your Docker state with the following command. Note that this command removes all unused containers, networks,
|
|
|
|
images (both dangling and unreferenced), and optionally, volumes.
|
2021-06-29 10:30:16 -07:00
|
|
|
|
|
|
|
```
|
|
|
|
docker system prune
|
2021-07-13 14:56:47 -07:00
|
|
|
```
|