mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-08-08 00:58:06 +00:00

* Issue 898 (#905) * ISSUE-898: additional information in the prerequisities for building and running code * ISSUE-898: removed unreachable old doc * ISSUE-898: added new docker compose to expose MySQL and ES ports to host machines * ISSUE-898: changed jdbc connect url to allow Public Key Retrieval * ISSUE-898: fixed log name to openmetadata.log Co-authored-by: Vijay Mariadassou <vijay@mariadassou.com> * Fixes #906 Remove unused methods lingering from #899 * Update pull_request_template.md * Update pull_request_template.md * ISSUE-861: add elasticsearch username & password (#894) * ISSUE-861: add elasticsearch username & password * ISSUE-861: python elasticsearch sink add username & password * ISSUE-861: bugfix * format code * format code * updated instructions to run integration tests * fixed api call to metadata server; changed test to cover both database as well as table operations everytime Co-authored-by: Vijay Mariadassou <vijay@mariadassou.com> Co-authored-by: sureshms <suresh@getcollate.io> Co-authored-by: Suresh Srinivas <srini30005@gmail.com> Co-authored-by: rong fengliang <1141591465@qq.com>
This guide will help you setup the Ingestion framework and connectors
OpenMetadata Ingesiton is a simple framework to build connectors and ingest metadata of various systems through OpenMetadata APIs. It could be used in an orchestration framework(e.g. Apache Airflow) to ingest metadata. Prerequisites
- Python >= 3.8.x
Install From PyPI
python3 -m pip install --upgrade pip wheel setuptools openmetadata-ingestion
python3 -m spacy download en_core_web_sm
Install Ingestion Connector Dependencies
Click here to go to Ingestion Connector's Documentation
Generate Redshift Data
metadata ingest -c ./pipelines/redshift.json
Generate Redshift Usage Data
metadata ingest -c ./pipelines/redshift_usage.json
Generate Sample Tables
metadata ingest -c ./pipelines/sample_tables.json
Generate Sample Users
metadata ingest -c ./pipelines/sample_users.json
Ingest MySQL data to Metadata APIs
metadata ingest -c ./pipelines/mysql.json
Ingest Bigquery data to Metadata APIs
export GOOGLE_APPLICATION_CREDENTIALS="$PWD/pipelines/creds/bigquery-cred.json"
metadata ingest -c ./pipelines/bigquery.json
Index Metadata into ElasticSearch
Run ElasticSearch docker
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.2
Run ingestion connector
metadata ingest -c ./pipelines/metadata_to_es.json
Generated sources
We are using datamodel-codegen
to get some pydantic
classes inside the generated
module from the JSON Schemas defining the API and Entities.
This tool bases the class name on the title
of the JSON Schema (vs. Java POJO, which uses the file name). Note that this convention is important for us, as having a standardized approach in creating the titles helps us create generic code capable of tackling multiple Type Variables.