mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-11-15 10:22:34 +00:00
* First approach * Test model * Validate endpoints * Update schema codegen * Prepare reqs dev * Rename titles and codegen * Update README * Keep lineage title as addLineage * Update pydantic classes * Update API wrapper & tests * Handle services methods * Prepare tests
79 lines
2.1 KiB
Markdown
79 lines
2.1 KiB
Markdown
---
|
|
This guide will help you setup the Ingestion framework and connectors
|
|
---
|
|
|
|

|
|
|
|
OpenMetadata Ingesiton is a simple framework to build connectors and ingest metadata of various systems through OpenMetadata APIs. It could be used in an orchestration framework(e.g. Apache Airflow) to ingest metadata.
|
|
**Prerequisites**
|
|
|
|
- Python >= 3.8.x
|
|
|
|
### Install From PyPI
|
|
|
|
```text
|
|
python3 -m pip install --upgrade pip wheel setuptools openmetadata-ingestion
|
|
python3 -m spacy download en_core_web_sm
|
|
```
|
|
|
|
### Install Ingestion Connector Dependencies
|
|
|
|
Click here to go to [Ingestion Connector's Documentation](https://docs.open-metadata.org/install/metadata-ingestion)
|
|
|
|
#### Generate Redshift Data
|
|
|
|
```text
|
|
metadata ingest -c ./pipelines/redshift.json
|
|
```
|
|
|
|
#### Generate Redshift Usage Data
|
|
|
|
```text
|
|
metadata ingest -c ./pipelines/redshift_usage.json
|
|
```
|
|
|
|
#### Generate Sample Tables
|
|
|
|
```text
|
|
metadata ingest -c ./pipelines/sample_tables.json
|
|
```
|
|
|
|
#### Generate Sample Users
|
|
|
|
```text
|
|
metadata ingest -c ./pipelines/sample_users.json
|
|
```
|
|
|
|
#### Ingest MySQL data to Metadata APIs
|
|
|
|
```text
|
|
metadata ingest -c ./pipelines/mysql.json
|
|
```
|
|
|
|
#### Ingest Bigquery data to Metadata APIs
|
|
|
|
```text
|
|
export GOOGLE_APPLICATION_CREDENTIALS="$PWD/pipelines/creds/bigquery-cred.json"
|
|
metadata ingest -c ./pipelines/bigquery.json
|
|
```
|
|
|
|
#### Index Metadata into ElasticSearch
|
|
|
|
#### Run ElasticSearch docker
|
|
|
|
```text
|
|
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.2
|
|
```
|
|
|
|
#### Run ingestion connector
|
|
|
|
```text
|
|
metadata ingest -c ./pipelines/metadata_to_es.json
|
|
```
|
|
|
|
## Generated sources
|
|
|
|
We are using `datamodel-codegen` to get some `pydantic` classes inside the `generated` module from the JSON Schemas defining the API and Entities.
|
|
|
|
This tool bases the class name on the `title` of the JSON Schema (vs. Java POJO, which uses the file name). Note that this convention is important for us, as having a standardized approach in creating the titles helps us create generic code capable of tackling multiple Type Variables.
|