Sriharsha Chintalapani e3cfb4dc65
For hive's complex data types parse raw type (#560)
* For hive's complex data types parse raw type

* Complex Data type logic modification

* Complex Data Type parsing implemented

* Raw Data type helper modification

* handling unnamed/anonymous struct

* Complex Nested structure implementation

* print statements removed and reverted to raw_data_type

* Complex Structure Array & MAP logic implemented

* Raw Data Type Logic revamped

* Redshift Integration

* MAP and UnionType support added

* Redshift Pypi package updated

* dataLength validationError fix

Co-authored-by: Ayush Shah <ayush@getcollate.io>
2021-10-04 23:36:35 +05:30
..
2021-10-04 22:56:28 +05:30
2021-08-01 14:27:44 -07:00
2021-08-16 18:37:04 +05:30
2021-08-13 01:40:56 +05:30
2021-08-13 02:03:39 +05:30
2021-08-13 01:40:56 +05:30

This guide will help you setup the Ingestion framework and connectors
This guide will help you setup the Ingestion framework and connectors

Python version 3.8+

OpenMetadata Ingesiton is a simple framework to build connectors and ingest metadata of various systems through OpenMetadata APIs. It could be used in an orchestration framework(e.g. Apache Airflow) to ingest metadata. Prerequisites

  • Python >= 3.8.x

Install From PyPI

python3 -m pip install --upgrade pip wheel setuptools openmetadata-ingestion
python3 -m spacy download en_core_web_sm

Install Ingestion Connector Dependencies

Click here to go to Ingestion Connector's Documentation

Generate Redshift Data

metadata ingest -c ./pipelines/redshift.json

Generate Redshift Usage Data

metadata ingest -c ./pipelines/redshift_usage.json

Generate Sample Tables

metadata ingest -c ./pipelines/sample_tables.json

Generate Sample Users

metadata ingest -c ./pipelines/sample_users.json

Ingest MySQL data to Metadata APIs

metadata ingest -c ./pipelines/mysql.json

Ingest Bigquery data to Metadata APIs

export GOOGLE_APPLICATION_CREDENTIALS="$PWD/pipelines/creds/bigquery-cred.json"
metadata ingest -c ./pipelines/bigquery.json

Index Metadata into ElasticSearch

Run ElasticSearch docker

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.2

Run ingestion connector

metadata ingest -c ./pipelines/metadata_to_es.json