2021-08-01 14:27:44 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								---
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								description: This guide will help you setup the Ingestion framework and connectors
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								---
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# Setup Ingestion
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-05 19:56:33 +05:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Ingestion is a data ingestion library, which is inspired by [Apache Gobblin ](https://gobblin.apache.org/ ). It could be used in an orchestration framework\(e.g. Apache Airflow\) to build data for OpenMetadata.
							 
						 
					
						
							
								
									
										
										
										
											2021-08-01 14:27:44 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{% hint style="info" %}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								**Prerequisites**
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								*  Python > = 3.8.x 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{% endhint %}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								## Install on your Dev
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								### Install Dependencies
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								cd ingestion
							 
						 
					
						
							
								
									
										
										
										
											2021-08-05 09:20:59 +05:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								python3 -m venv env
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								source env/bin/activate
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								./ingestion_dependency.sh
							 
						 
					
						
							
								
									
										
										
										
											2021-08-01 14:27:44 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								You only need to run above command once.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								### Known Issues
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								#### Fix MySQL lib
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 sudo ln -s /usr/local/mysql/lib/libmysqlclient.21.dylib /usr/local/lib/libmysqlclient.21.dylib
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								### Run Ingestion Connectors
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								#### Generate Redshift Data
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								source env/bin/activate
							 
						 
					
						
							
								
									
										
										
										
											2021-08-21 00:43:07 +05:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								metadata ingest -c ./examples/workflows/redshift.json
							 
						 
					
						
							
								
									
										
										
										
											2021-08-01 14:27:44 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								#### Generate Redshift Usage Data
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 source env/bin/activate
							 
						 
					
						
							
								
									
										
										
										
											2021-08-21 00:43:07 +05:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 metadata ingest -c ./examples/workflows/redshift_usage.json
							 
						 
					
						
							
								
									
										
										
										
											2021-08-01 14:27:44 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								#### Generate Sample Tables
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 source env/bin/activate
							 
						 
					
						
							
								
									
										
										
										
											2021-08-05 09:20:59 +05:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 metadata ingest -c ./pipelines/sample_tables.json
							 
						 
					
						
							
								
									
										
										
										
											2021-08-01 14:27:44 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
									
										
										
										
											2021-08-21 00:43:07 +05:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								#### Generate Sample Usage
  
						 
					
						
							
								
									
										
										
										
											2021-08-01 14:27:44 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-21 00:43:07 +05:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 source env/bin/activate
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 metadata ingest -c ./pipelines/sample_usage.json
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
									
										
										
										
											2021-08-01 14:27:44 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								#### Generate Sample Users
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 source env/bin/activate
							 
						 
					
						
							
								
									
										
										
										
											2021-08-05 09:20:59 +05:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 metadata ingest -c ./pipelines/sample_users.json
							 
						 
					
						
							
								
									
										
										
										
											2021-08-01 14:27:44 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								#### Ingest MySQL data to Metadata APIs
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 source env/bin/activate
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 metadata ingest -c ./pipelines/mysql.json
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								#### Ingest Bigquery data to Metadata APIs
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 source env/bin/activate
							 
						 
					
						
							
								
									
										
										
										
											2021-08-21 00:43:07 +05:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 metadata ingest -c ./examples/workflows/bigquery.json
							 
						 
					
						
							
								
									
										
										
										
											2021-08-01 14:27:44 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								#### Index Metadata into ElasticSearch
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								#### Run ElasticSearch docker
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.2
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								#### Run ingestion connector
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 source env/bin/activate
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 metadata ingest -c ./pipelines/metadata_to_es.json
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								## Install using Docker
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								### Run Ingestion docker
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-05 19:56:33 +05:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								The OpenMetadata should be up and running before you run the docker on the system.
							 
						 
					
						
							
								
									
										
										
										
											2021-08-01 14:27:44 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								docker build -t ingestion .
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								docker run --network="host" -t ingestion
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								### Run ElasticSearch docker
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Run the command to start ElasticSearch docker
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.2
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								## Test - Integration
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Run the command to start integration tests
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```text
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								source env/bin/activate
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								cd tests/integration/
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								pytest -c /dev/null {folder-name} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								#pytest -c /dev/null mysql
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								## Design