2020-07-31 18:48:18 -07:00
# DataHub Quickstart Guide
2021-06-14 17:15:24 -07:00
## Deploying DataHub
2021-07-13 14:56:47 -07:00
To deploy a new instance of DataHub, perform the following steps.
2021-06-14 17:15:24 -07:00
2021-11-22 17:53:20 -06:00
1. Install [docker ](https://docs.docker.com/install/ ), [jq ](https://stedolan.github.io/jq/download/ ) and [docker-compose ](https://docs.docker.com/compose/install/ ) (if
2021-10-19 12:14:21 -07:00
using Linux). Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs,
8GB RAM, 2GB Swap area, and 10GB disk space.
2021-06-14 17:15:24 -07:00
2021-07-13 14:56:47 -07:00
2. Launch the Docker Engine from command line or the desktop app.
2021-06-14 17:15:24 -07:00
3. Install the DataHub CLI
2021-07-13 14:56:47 -07:00
2021-06-14 17:15:24 -07:00
a. Ensure you have Python 3.6+ installed & configured. (Check using `python3 --version` )
2021-07-13 14:56:47 -07:00
b. Run the following commands in your terminal
2021-06-14 17:15:24 -07:00
```
python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip uninstall datahub acryl-datahub || true # sanity check - ok if it fails
python3 -m pip install --upgrade acryl-datahub
datahub version
```
2021-10-19 12:14:21 -07:00
If you see "command not found", try running cli commands with the prefix 'python3 -m'
instead: `python3 -m datahub version`
2021-06-14 17:15:24 -07:00
4. To deploy DataHub, run the following CLI command from your terminal
2021-07-13 14:56:47 -07:00
2021-06-14 17:15:24 -07:00
```
2021-07-13 14:56:47 -07:00
datahub docker quickstart
2021-06-14 17:15:24 -07:00
```
2021-10-19 12:14:21 -07:00
Upon completion of this step, you should be able to navigate to the DataHub UI
at [http://localhost:9002 ](http://localhost:9002 ) in your browser. You can sign in using `datahub` as both the
username and password.
2021-06-14 17:15:24 -07:00
5. To ingest the sample metadata, run the following CLI command from your terminal
```
datahub docker ingest-sample-data
```
2021-10-19 12:14:21 -07:00
That's it! To start pushing your company's metadata into DataHub, take a look at
the [Metadata Ingestion Framework ](../metadata-ingestion/README.md ).
2021-06-14 17:15:24 -07:00
## Resetting DataHub
2021-07-13 14:56:47 -07:00
To cleanse DataHub of all of it's state (e.g. before ingesting your own), you can use the CLI `nuke` command.
2021-06-14 17:15:24 -07:00
```
datahub docker nuke
```
2021-12-14 01:34:32 +05:30
If you want to delete the containers but keep the data you can add `--keep-data` flag to the command. This allows you to run the `quickstart` command to get DataHub running with your data that was ingested earlier.
2021-06-29 10:30:16 -07:00
## Troubleshooting
2021-06-14 17:15:24 -07:00
### Command not found: datahub
2021-10-19 12:14:21 -07:00
If running the datahub cli produces "command not found" errors inside your terminal, your system may be defaulting to an
older version of Python. Try prefixing your `datahub` commands with `python3 -m` :
2021-07-13 14:56:47 -07:00
2021-06-14 17:15:24 -07:00
```
python3 -m datahub docker quickstart
2021-06-29 10:30:16 -07:00
```
2021-11-03 01:56:56 -04:00
Another possibility is that your system PATH does not include pip's `$HOME/.local/bin` directory. On linux, you can add this to your `~/.bashrc` :
```
if [ -d "$HOME/.local/bin" ] ; then
PATH="$HOME/.local/bin:$PATH"
fi
```
2021-06-29 10:30:16 -07:00
### Miscellaneous Docker issues
2021-07-13 14:56:47 -07:00
There can be misc issues with Docker, like conflicting containers and dangling volumes, that can often be resolved by
2021-10-19 12:14:21 -07:00
pruning your Docker state with the following command. Note that this command removes all unused containers, networks,
images (both dangling and unreferenced), and optionally, volumes.
2021-06-29 10:30:16 -07:00
```
docker system prune
2021-07-13 14:56:47 -07:00
```