OpenMetadata/docs/install/setup-ingestion.md
2021-08-24 23:53:34 +05:30

2.5 KiB

description
This guide will help you setup the Ingestion framework and connectors

Setup Ingestion

Ingestion is a data ingestion library, which is inspired by Apache Gobblin. It could be used in an orchestration framework(e.g. Apache Airflow) to build data for OpenMetadata.

{% hint style="info" %} Prerequisites

  • Python >= 3.8.x {% endhint %}

Install on your Dev

Install Dependencies

cd ingestion
python3 -m venv env
source env/bin/activate
./ingestion_dependency.sh

You only need to run above command once.

Known Issues

Fix MySQL lib

 sudo ln -s /usr/local/mysql/lib/libmysqlclient.21.dylib /usr/local/lib/libmysqlclient.21.dylib

Run Ingestion Connectors

Generate Redshift Data

source env/bin/activate
metadata ingest -c ./examples/workflows/redshift.json

Generate Redshift Usage Data

 source env/bin/activate
 metadata ingest -c ./examples/workflows/redshift_usage.json

Generate Sample Tables

 source env/bin/activate
 metadata ingest -c ./pipelines/sample_tables.json

Generate Sample Usage

 source env/bin/activate
 metadata ingest -c ./pipelines/sample_usage.json

Generate Sample Users

 source env/bin/activate
 metadata ingest -c ./pipelines/sample_users.json

Ingest MySQL data to Metadata APIs

 source env/bin/activate
 metadata ingest -c ./pipelines/mysql.json

Ingest Bigquery data to Metadata APIs

 source env/bin/activate
 metadata ingest -c ./examples/workflows/bigquery.json

Index Metadata into ElasticSearch

Run ElasticSearch docker

 docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.2

Run ingestion connector

 source env/bin/activate
 metadata ingest -c ./pipelines/metadata_to_es.json

Install using Docker

Run Ingestion docker

The OpenMetadata should be up and running before you run the docker on the system.

docker build -t ingestion .
docker run --network="host" -t ingestion

Run ElasticSearch docker

Run the command to start ElasticSearch docker

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.2

Test - Integration

Run the command to start integration tests

source env/bin/activate
cd tests/integration/
pytest -c /dev/null {folder-name} 
#pytest -c /dev/null mysql

Design