mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-07-29 04:09:54 +00:00

* For hive's complex data types parse raw type * Complex Data type logic modification * Complex Data Type parsing implemented * Raw Data type helper modification * handling unnamed/anonymous struct * Complex Nested structure implementation * print statements removed and reverted to raw_data_type * Complex Structure Array & MAP logic implemented * Raw Data Type Logic revamped * Redshift Integration * MAP and UnionType support added * Redshift Pypi package updated * dataLength validationError fix Co-authored-by: Ayush Shah <ayush@getcollate.io>
This guide will help you setup the Ingestion framework and connectors
OpenMetadata Ingesiton is a simple framework to build connectors and ingest metadata of various systems through OpenMetadata APIs. It could be used in an orchestration framework(e.g. Apache Airflow) to ingest metadata. Prerequisites
- Python >= 3.8.x
Install From PyPI
python3 -m pip install --upgrade pip wheel setuptools openmetadata-ingestion
python3 -m spacy download en_core_web_sm
Install Ingestion Connector Dependencies
Click here to go to Ingestion Connector's Documentation
Generate Redshift Data
metadata ingest -c ./pipelines/redshift.json
Generate Redshift Usage Data
metadata ingest -c ./pipelines/redshift_usage.json
Generate Sample Tables
metadata ingest -c ./pipelines/sample_tables.json
Generate Sample Users
metadata ingest -c ./pipelines/sample_users.json
Ingest MySQL data to Metadata APIs
metadata ingest -c ./pipelines/mysql.json
Ingest Bigquery data to Metadata APIs
export GOOGLE_APPLICATION_CREDENTIALS="$PWD/pipelines/creds/bigquery-cred.json"
metadata ingest -c ./pipelines/bigquery.json
Index Metadata into ElasticSearch
Run ElasticSearch docker
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.2
Run ingestion connector
metadata ingest -c ./pipelines/metadata_to_es.json