vijaypm 9edd824c3a
Issue 910 (#914)
* Issue 898 (#905)

* ISSUE-898: additional information in the prerequisities for building and running code

* ISSUE-898: removed unreachable old doc

* ISSUE-898: added new docker compose to expose MySQL and ES ports to host machines

* ISSUE-898: changed jdbc connect url to allow Public Key Retrieval

* ISSUE-898: fixed log name to openmetadata.log

Co-authored-by: Vijay Mariadassou <vijay@mariadassou.com>

* Fixes #906 Remove unused methods lingering from #899

* Update pull_request_template.md

* Update pull_request_template.md

* ISSUE-861: add elasticsearch username & password (#894)

* ISSUE-861: add elasticsearch username & password

* ISSUE-861:  python elasticsearch sink add username & password

* ISSUE-861: bugfix

* format code

* format code

* updated instructions to run integration tests

* fixed api call to metadata server; changed test to cover both database as well as table operations everytime

Co-authored-by: Vijay Mariadassou <vijay@mariadassou.com>
Co-authored-by: sureshms <suresh@getcollate.io>
Co-authored-by: Suresh Srinivas <srini30005@gmail.com>
Co-authored-by: rong fengliang <1141591465@qq.com>
2021-10-23 19:58:26 -07:00
..
2021-10-23 19:58:26 -07:00
2021-08-01 14:27:44 -07:00
2021-08-16 18:37:04 +05:30
2021-10-14 07:46:24 -07:00
2021-10-14 07:46:24 -07:00
2021-08-13 01:40:56 +05:30
2021-10-21 14:51:38 -07:00
2021-08-13 01:40:56 +05:30

This guide will help you setup the Ingestion framework and connectors
This guide will help you setup the Ingestion framework and connectors

Python version 3.8+

OpenMetadata Ingesiton is a simple framework to build connectors and ingest metadata of various systems through OpenMetadata APIs. It could be used in an orchestration framework(e.g. Apache Airflow) to ingest metadata. Prerequisites

  • Python >= 3.8.x

Install From PyPI

python3 -m pip install --upgrade pip wheel setuptools openmetadata-ingestion
python3 -m spacy download en_core_web_sm

Install Ingestion Connector Dependencies

Click here to go to Ingestion Connector's Documentation

Generate Redshift Data

metadata ingest -c ./pipelines/redshift.json

Generate Redshift Usage Data

metadata ingest -c ./pipelines/redshift_usage.json

Generate Sample Tables

metadata ingest -c ./pipelines/sample_tables.json

Generate Sample Users

metadata ingest -c ./pipelines/sample_users.json

Ingest MySQL data to Metadata APIs

metadata ingest -c ./pipelines/mysql.json

Ingest Bigquery data to Metadata APIs

export GOOGLE_APPLICATION_CREDENTIALS="$PWD/pipelines/creds/bigquery-cred.json"
metadata ingest -c ./pipelines/bigquery.json

Index Metadata into ElasticSearch

Run ElasticSearch docker

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.2

Run ingestion connector

metadata ingest -c ./pipelines/metadata_to_es.json

Generated sources

We are using datamodel-codegen to get some pydantic classes inside the generated module from the JSON Schemas defining the API and Entities.

This tool bases the class name on the title of the JSON Schema (vs. Java POJO, which uses the file name). Note that this convention is important for us, as having a standardized approach in creating the titles helps us create generic code capable of tackling multiple Type Variables.