| 
									
										
										
										
											2020-07-31 18:48:18 -07:00
										 |  |  | # DataHub Quickstart Guide
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  | ## Deploying DataHub
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-07-13 14:56:47 -07:00
										 |  |  | To deploy a new instance of DataHub, perform the following steps. | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-11-22 17:53:20 -06:00
										 |  |  | 1. Install [docker](https://docs.docker.com/install/), [jq](https://stedolan.github.io/jq/download/) and [docker-compose](https://docs.docker.com/compose/install/) (if | 
					
						
							| 
									
										
										
										
											2021-10-19 12:14:21 -07:00
										 |  |  |    using Linux). Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs, | 
					
						
							|  |  |  |    8GB RAM, 2GB Swap area, and 10GB disk space. | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-07-13 14:56:47 -07:00
										 |  |  | 2. Launch the Docker Engine from command line or the desktop app. | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 3. Install the DataHub CLI | 
					
						
							| 
									
										
										
										
											2021-07-13 14:56:47 -07:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  |    a. Ensure you have Python 3.6+ installed & configured. (Check using `python3 --version`) | 
					
						
							| 
									
										
										
										
											2021-07-13 14:56:47 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  |    b. Run the following commands in your terminal | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  |    ``` | 
					
						
							|  |  |  |    python3 -m pip install --upgrade pip wheel setuptools | 
					
						
							|  |  |  |    python3 -m pip uninstall datahub acryl-datahub || true  # sanity check - ok if it fails | 
					
						
							|  |  |  |    python3 -m pip install --upgrade acryl-datahub | 
					
						
							|  |  |  |    datahub version | 
					
						
							|  |  |  |    ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-04-05 06:19:50 +05:30
										 |  |  | :::note | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    If you see "command not found", try running cli commands with the prefix 'python3 -m' instead like `python3 -m datahub version` | 
					
						
							|  |  |  |    Note that DataHub CLI does not support Python 2.x. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 4. To deploy DataHub, run the following CLI command from your terminal | 
					
						
							| 
									
										
										
										
											2021-07-13 14:56:47 -07:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  |    ``` | 
					
						
							| 
									
										
										
										
											2021-07-13 14:56:47 -07:00
										 |  |  |    datahub docker quickstart | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  |    ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-10-19 12:14:21 -07:00
										 |  |  |    Upon completion of this step, you should be able to navigate to the DataHub UI | 
					
						
							|  |  |  |    at [http://localhost:9002](http://localhost:9002) in your browser. You can sign in using `datahub` as both the | 
					
						
							|  |  |  |    username and password. | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 5. To ingest the sample metadata, run the following CLI command from your terminal | 
					
						
							|  |  |  |    ``` | 
					
						
							|  |  |  |    datahub docker ingest-sample-data | 
					
						
							|  |  |  |    ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-10-19 12:14:21 -07:00
										 |  |  | That's it! To start pushing your company's metadata into DataHub, take a look at | 
					
						
							|  |  |  | the [Metadata Ingestion Framework](../metadata-ingestion/README.md). | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ## Resetting DataHub
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-07-13 14:56:47 -07:00
										 |  |  | To cleanse DataHub of all of it's state (e.g. before ingesting your own), you can use the CLI `nuke` command. | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | datahub docker nuke | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-03-28 22:07:37 +05:30
										 |  |  | ## Updating DataHub locally
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | If you have been testing DataHub locally, a new version of DataHub got released and you want to try the new version then you can use below commands.  | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | datahub docker nuke --keep-data | 
					
						
							|  |  |  | datahub docker quickstart | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This will keep the data that you have ingested so far in DataHub and start a new quickstart with the latest version of DataHub. | 
					
						
							| 
									
										
										
										
											2021-12-14 01:34:32 +05:30
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-06-29 10:30:16 -07:00
										 |  |  | ## Troubleshooting
 | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ### Command not found: datahub
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-10-19 12:14:21 -07:00
										 |  |  | If running the datahub cli produces "command not found" errors inside your terminal, your system may be defaulting to an | 
					
						
							|  |  |  | older version of Python. Try prefixing your `datahub` commands with `python3 -m`: | 
					
						
							| 
									
										
										
										
											2021-07-13 14:56:47 -07:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-06-14 17:15:24 -07:00
										 |  |  | ``` | 
					
						
							|  |  |  | python3 -m datahub docker quickstart | 
					
						
							| 
									
										
										
										
											2021-06-29 10:30:16 -07:00
										 |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-11-03 01:56:56 -04:00
										 |  |  | Another possibility is that your system PATH does not include pip's `$HOME/.local/bin` directory.  On linux, you can add this to your `~/.bashrc`: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | if [ -d "$HOME/.local/bin" ] ; then | 
					
						
							|  |  |  |     PATH="$HOME/.local/bin:$PATH" | 
					
						
							|  |  |  | fi | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-06-29 10:30:16 -07:00
										 |  |  | ### Miscellaneous Docker issues
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-07-13 14:56:47 -07:00
										 |  |  | There can be misc issues with Docker, like conflicting containers and dangling volumes, that can often be resolved by | 
					
						
							| 
									
										
										
										
											2021-10-19 12:14:21 -07:00
										 |  |  | pruning your Docker state with the following command. Note that this command removes all unused containers, networks, | 
					
						
							|  |  |  | images (both dangling and unreferenced), and optionally, volumes. | 
					
						
							| 
									
										
										
										
											2021-06-29 10:30:16 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | docker system prune | 
					
						
							| 
									
										
										
										
											2021-07-13 14:56:47 -07:00
										 |  |  | ``` |