2020-07-31 18:48:18 -07:00
# DataHub Quickstart Guide
2021-06-14 17:15:24 -07:00
## Deploying DataHub
2021-07-13 14:56:47 -07:00
To deploy a new instance of DataHub, perform the following steps.
2021-06-14 17:15:24 -07:00
2022-05-30 13:48:23 +02:00
1. Install [docker ](https://docs.docker.com/install/ ), [jq ](https://stedolan.github.io/jq/download/ ) and [docker-compose v1 ](https://github.com/docker/compose/blob/master/INSTALL.md ) (if
2021-10-19 12:14:21 -07:00
using Linux). Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs,
8GB RAM, 2GB Swap area, and 10GB disk space.
2021-06-14 17:15:24 -07:00
2021-07-13 14:56:47 -07:00
2. Launch the Docker Engine from command line or the desktop app.
2021-06-14 17:15:24 -07:00
3. Install the DataHub CLI
2021-07-13 14:56:47 -07:00
2021-06-14 17:15:24 -07:00
a. Ensure you have Python 3.6+ installed & configured. (Check using `python3 --version` )
2021-07-13 14:56:47 -07:00
b. Run the following commands in your terminal
2021-06-14 17:15:24 -07:00
```
python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip uninstall datahub acryl-datahub || true # sanity check - ok if it fails
python3 -m pip install --upgrade acryl-datahub
datahub version
```
2022-04-05 06:19:50 +05:30
:::note
If you see "command not found", try running cli commands with the prefix 'python3 -m' instead like `python3 -m datahub version`
Note that DataHub CLI does not support Python 2.x.
:::
2021-06-14 17:15:24 -07:00
2022-06-27 23:56:50 +01:00
4. To deploy a DataHub instance locally, run the following CLI command from your terminal
2021-07-13 14:56:47 -07:00
2021-06-14 17:15:24 -07:00
```
2021-07-13 14:56:47 -07:00
datahub docker quickstart
2021-06-14 17:15:24 -07:00
```
2022-06-27 23:56:50 +01:00
This will deploy a DataHub instance using [docker-compose ](https://docs.docker.com/compose/ ).
2021-10-19 12:14:21 -07:00
Upon completion of this step, you should be able to navigate to the DataHub UI
at [http://localhost:9002 ](http://localhost:9002 ) in your browser. You can sign in using `datahub` as both the
username and password.
2021-06-14 17:15:24 -07:00
2022-06-27 23:56:50 +01:00
If you would like to modify/configure the DataHub installation in some way, please download the [docker-compose.yaml ](https://raw.githubusercontent.com/datahub-project/datahub/master/docker/quickstart/docker-compose-without-neo4j-m1.quickstart.yml ) used by the cli tool, modify it as necessary and deploy DataHub by passing the downloaded docker-compose file:
```
datahub docker quickstart --quickstart-compose-file < path to compose file >
```
2021-06-14 17:15:24 -07:00
5. To ingest the sample metadata, run the following CLI command from your terminal
2022-05-30 13:48:23 +02:00
2021-06-14 17:15:24 -07:00
```
datahub docker ingest-sample-data
```
2022-06-21 15:15:57 +01:00
:::note
2022-07-01 20:35:55 +01:00
If you've enabled [Metadata Service Authentication ](authentication/introducing-metadata-service-authentication.md ), you'll need to provide a Personal Access Token
2022-06-21 15:15:57 +01:00
using the `--token <token>` parameter in the command.
:::
2022-06-27 23:56:50 +01:00
That's it! Now feel free to play around with DataHub!
## Next Steps
### Ingest Metadata
To start pushing your company's metadata into DataHub, take a look at the [Metadata Ingestion Framework ](../metadata-ingestion/README.md ).
### Invite Users
2022-07-01 20:35:55 +01:00
To add users to your deployment to share with your team check out our [Adding Users to DataHub ](authentication/guides/add-users.md )
2022-06-27 23:56:50 +01:00
### Enable Authentication
2022-07-01 20:35:55 +01:00
To enable SSO, check out [Configuring OIDC Authentication ](authentication/guides/sso/configure-oidc-react.md ) or [Configuring JaaS Authentication ](authentication/guides/jaas.md ).
2022-06-27 23:56:50 +01:00
2022-07-01 20:35:55 +01:00
To enable backend Authentication, check out [authentication in DataHub's backend ](authentication/introducing-metadata-service-authentication.md#Configuring Metadata Service Authentication ).
2022-06-27 23:56:50 +01:00
### Move to Production
We recommend deploying DataHub to production using Kubernetes. We provide helpful [Helm Charts ](https://artifacthub.io/packages/helm/datahub/datahub ) to help you quickly get up and running. Check out [Deploying DataHub to Kubernetes ](./deploy/kubernetes.md ) for a step-by-step walkthrough.
2021-06-14 17:15:24 -07:00
## Resetting DataHub
2021-07-13 14:56:47 -07:00
To cleanse DataHub of all of it's state (e.g. before ingesting your own), you can use the CLI `nuke` command.
2021-06-14 17:15:24 -07:00
```
datahub docker nuke
```
2022-03-28 22:07:37 +05:30
## Updating DataHub locally
If you have been testing DataHub locally, a new version of DataHub got released and you want to try the new version then you can use below commands.
```
datahub docker nuke --keep-data
datahub docker quickstart
```
This will keep the data that you have ingested so far in DataHub and start a new quickstart with the latest version of DataHub.
2021-12-14 01:34:32 +05:30
2021-06-29 10:30:16 -07:00
## Troubleshooting
2021-06-14 17:15:24 -07:00
### Command not found: datahub
2021-10-19 12:14:21 -07:00
If running the datahub cli produces "command not found" errors inside your terminal, your system may be defaulting to an
older version of Python. Try prefixing your `datahub` commands with `python3 -m` :
2021-07-13 14:56:47 -07:00
2021-06-14 17:15:24 -07:00
```
python3 -m datahub docker quickstart
2021-06-29 10:30:16 -07:00
```
2021-11-03 01:56:56 -04:00
Another possibility is that your system PATH does not include pip's `$HOME/.local/bin` directory. On linux, you can add this to your `~/.bashrc` :
```
if [ -d "$HOME/.local/bin" ] ; then
PATH="$HOME/.local/bin:$PATH"
fi
```
2021-06-29 10:30:16 -07:00
### Miscellaneous Docker issues
2021-07-13 14:56:47 -07:00
There can be misc issues with Docker, like conflicting containers and dangling volumes, that can often be resolved by
2021-10-19 12:14:21 -07:00
pruning your Docker state with the following command. Note that this command removes all unused containers, networks,
images (both dangling and unreferenced), and optionally, volumes.
2021-06-29 10:30:16 -07:00
```
docker system prune
2021-07-13 14:56:47 -07:00
```