mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-10-24 07:14:48 +00:00
200 lines
7.5 KiB
Markdown
200 lines
7.5 KiB
Markdown
![]() |
# Installation and deployment instructions (using Postgres as example)
|
||
|
|
||
|
Below are the instructions for connecting a Postgress server. The installation steps should be the same for connecting all kinds of servers. Different servers would require different configurations in the .yaml or DAG files. See https://docs.open-metadata.org/integrations/connectors for your configuration.
|
||
|
|
||
|
# Goal: To run Postgres metadata ingestion and quality tests with OpenMetadata using Airflow scheduler
|
||
|
|
||
|
Note: This procedure does not support Windows, because Windows does not implement "signal.SIGALRM". **It is highly recommended to use WSL 2 if you are on Windows**.
|
||
|
|
||
|
## Requirements:
|
||
|
See https://docs.open-metadata.org/overview/run-openmetadata-with-prefect "Requirements" section
|
||
|
|
||
|
## Installation:
|
||
|
1. Clone this git hub repo:
|
||
|
`git clone https://github.com/open-metadata/OpenMetadata.git`
|
||
|
|
||
|
2. Cd to ~/.../openmetadata/docker/metadata
|
||
|
|
||
|
3. Start the OpenMetadata containers. This will allow you run OpenMetadata in Docker:
|
||
|
`docker compose up -d`
|
||
|
- To check the status of services, run `docker compose ps`
|
||
|
- To access the UI: http://localhost:8585
|
||
|
|
||
|
4. Install the OpenMetadata ingestion package.
|
||
|
- (optional but highly recommended): Before installing this package, it is recommended to create and activate a virtual environment. To do this, run:
|
||
|
`python -m venv env` and `source env/bin/activate`
|
||
|
|
||
|
- To install the OpenMetadata ingestion package:
|
||
|
`pip install --upgrade "openmetadata-ingestion[docker]==0.10.3"` (specify the release version to ensure compatibility)
|
||
|
|
||
|
5. Install Airflow:
|
||
|
- 5A: Install Airflow Lineage Backend: `pip3 install "openmetadata-ingestion[airflow-container]"==0.10.3`
|
||
|
- 5B: Install Airflow postgres connector module: `pip3 install "openmetadata-ingestion[postgres]"==0.10.3`
|
||
|
- 5C: Install Airflow APIs: `pip3 install "openmetadata-airflow-managed-apis"==0.10.3`
|
||
|
- 5D: Install necessary Airflow plugins:
|
||
|
- 1) Download the latest openmetadata-airflow-apis-plugins release from https://github.com/open-metadata/OpenMetadata/releases
|
||
|
- 2) Untar it under your {AIRFLOW_HOME} directory (usually c/Users/Yourname/airflow). This will create and setup a plugins directory under {AIRFLOW_HOME} .
|
||
|
- 3) `cp -r {AIRFLOW_HOME}/plugins/dag_templates {AIRFLOW_HOME}`
|
||
|
- 4) `mkdir -p {AIRFLOW_HOME}/dag_generated_configs`
|
||
|
- 5) (re)start the airflow webserver and scheduler
|
||
|
|
||
|
6. Configure Airflow:
|
||
|
- 6A: configure airflow.cfg in your AIRFLOW_HOME directory. Check and make all the folder directories point to the right places. For instance, dags_folder = YOUR_AIRFLOW_HOME/dags
|
||
|
- 6B: configure openmetadata.yaml and update the airflowConfiguration section. See: https://docs.open-metadata.org/integrations/airflow/configure-airflow-in-the-openmetadata-server
|
||
|
|
||
|
## To run a metadata ingestion workflow with Airflow ingestion DAGs on Postgres data:
|
||
|
|
||
|
1. Prepare the Ingestion DAG:
|
||
|
To see a more complete tutorial on ingestion DAG, see https://docs.open-metadata.org/integrations/connectors/postgres/run-postgres-connector-with-the-airflow-sdk
|
||
|
To be brief, below is my own DAG. Copy & Paste the following into a python file (postgres_demo.py):
|
||
|
|
||
|
```
|
||
|
import pathlib
|
||
|
import json
|
||
|
from datetime import timedelta
|
||
|
from airflow import DAG
|
||
|
|
||
|
try:
|
||
|
from airflow.operators.python import PythonOperator
|
||
|
except ModuleNotFoundError:
|
||
|
from airflow.operators.python_operator import PythonOperator
|
||
|
|
||
|
from metadata.config.common import load_config_file
|
||
|
from metadata.ingestion.api.workflow import Workflow
|
||
|
from airflow.utils.dates import days_ago
|
||
|
|
||
|
default_args = {
|
||
|
"owner": "user_name",
|
||
|
"email": ["username@org.com"],
|
||
|
"email_on_failure": False,
|
||
|
"retries": 3,
|
||
|
"retry_delay": timedelta(minutes=5),
|
||
|
"execution_timeout": timedelta(minutes=60)
|
||
|
}
|
||
|
|
||
|
config = """
|
||
|
{
|
||
|
"source":{
|
||
|
"type": "postgres",
|
||
|
"serviceName": "postgres_demo",
|
||
|
"serviceConnection": {
|
||
|
"config": {
|
||
|
"type": "Postgres",
|
||
|
"username": "postgres", (change to your username)
|
||
|
"password": "postgres", (change to your password)
|
||
|
"hostPort": "192.168.1.55:5432", (change to your hostPort)
|
||
|
"database": "surveillance_hub" (change to your database)
|
||
|
}
|
||
|
},
|
||
|
"sourceConfig":{
|
||
|
"config": { (all of the following can switch to true or false)
|
||
|
"enableDataProfiler": "true" or "false",
|
||
|
"markDeletedTables": "true" or "false",
|
||
|
"includeTables": "true" or "false",
|
||
|
"includeViews": "true" or "false",
|
||
|
"generateSampleData": "true" or "false"
|
||
|
}
|
||
|
}
|
||
|
},
|
||
|
"sink":{
|
||
|
"type": "metadata-rest",
|
||
|
"config": {}
|
||
|
},
|
||
|
"workflowConfig": {
|
||
|
"openMetadataServerConfig": {
|
||
|
"hostPort": "http://localhost:8585/api",
|
||
|
"authProvider": "no-auth"
|
||
|
}
|
||
|
}
|
||
|
|
||
|
|
||
|
}
|
||
|
"""
|
||
|
|
||
|
def metadata_ingestion_workflow():
|
||
|
workflow_config = json.loads(config)
|
||
|
workflow = Workflow.create(workflow_config)
|
||
|
workflow.execute()
|
||
|
workflow.raise_from_status()
|
||
|
workflow.print_status()
|
||
|
workflow.stop()
|
||
|
|
||
|
|
||
|
with DAG(
|
||
|
"sample_data",
|
||
|
default_args=default_args,
|
||
|
description="An example DAG which runs a OpenMetadata ingestion workflow",
|
||
|
start_date=days_ago(1),
|
||
|
is_paused_upon_creation=False,
|
||
|
schedule_interval='*/5 * * * *',
|
||
|
catchup=False,
|
||
|
) as dag:
|
||
|
ingest_task = PythonOperator(
|
||
|
task_id="ingest_using_recipe",
|
||
|
python_callable=metadata_ingestion_workflow,
|
||
|
)
|
||
|
|
||
|
if __name__ == "__main__":
|
||
|
metadata_ingestion_workflow()
|
||
|
```
|
||
|
|
||
|
2. Run the DAG:
|
||
|
`
|
||
|
python postgres_demo.py
|
||
|
`
|
||
|
|
||
|
- Alternatively, we could run without Airflow SDK and with OpenMetadata's own methods. Run `metadata ingest -c /Your_Path_To_Json/.json`
|
||
|
The json configuration is exactly the same as the json configuration in the DAG.
|
||
|
- Or, we could also run it with `metadata ingest -c /Your_Path_To_Yaml/.yaml`
|
||
|
The yaml configuration would be the exact same except without the curly brackets and the double quotes.
|
||
|
|
||
|
## To run a profiler workflow on Postgres data
|
||
|
1. Prepare the DAG OR configure the yaml/json:
|
||
|
- To configure the quality tests in json/yaml, see https://docs.open-metadata.org/data-quality/data-quality-overview/tests
|
||
|
- To prepare the DAG, see https://github.com/open-metadata/OpenMetadata/tree/0.10.3-release/data-quality/data-quality-overview
|
||
|
|
||
|
Example yaml I was using:
|
||
|
```
|
||
|
source:
|
||
|
type: postgres
|
||
|
serviceName: your_service_name
|
||
|
serviceConnection:
|
||
|
config:
|
||
|
type: Postgres
|
||
|
username: your_username
|
||
|
password: your_password
|
||
|
hostPort:
|
||
|
database: your_database
|
||
|
sourceConfig:
|
||
|
config:
|
||
|
type: Profiler
|
||
|
|
||
|
processor:
|
||
|
type: orm-profiler
|
||
|
config:
|
||
|
test_suite:
|
||
|
name: demo_test
|
||
|
tests:
|
||
|
- table: your_table_name (FQN)
|
||
|
column_tests:
|
||
|
- columnName: id
|
||
|
testCase:
|
||
|
columnTestType: columnValuesToBeBetween
|
||
|
config:
|
||
|
minValue: 0
|
||
|
maxValue: 10
|
||
|
sink:
|
||
|
type: metadata-rest
|
||
|
config: {}
|
||
|
workflowConfig:
|
||
|
openMetadataServerConfig:
|
||
|
hostPort: http://localhost:8585/api
|
||
|
authProvider: no-auth
|
||
|
```
|
||
|
Note that the table name must be FQN and match exactly with the table path on the OpenMetadata UI.
|
||
|
|
||
|
2. Run it with
|
||
|
`metadata profile -c /path_to_yaml/.yaml`
|
||
|
|
||
|
Make sure to refresh the OpenMetadata UI and click on the Data Quality tab to see the results.
|