2021-11-13 23:03:20 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								---
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								description: Below is the sample code example you can refer to integrate with Airflow
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								---
							 
						 
					
						
							
								
									
										
										
										
											2021-08-01 14:27:44 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-11-13 23:03:20 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								# Run Metadata Ingestion
 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-10-15 03:51:13 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								## Airflow Example for Sample Data
 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 20:13:00 +00:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								```python
							 
						 
					
						
							
								
									
										
										
										
											2021-10-15 03:51:13 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								import pathlib
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								import json
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								from datetime import timedelta
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								from airflow import DAG
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								try:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    from airflow.operators.python import PythonOperator
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								except ModuleNotFoundError:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    from airflow.operators.python_operator import PythonOperator
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								from metadata.config.common import load_config_file
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								from metadata.ingestion.api.workflow import Workflow
							 
						 
					
						
							
								
									
										
										
										
											2021-11-13 23:03:20 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								from airflow.utils.dates import days_ago
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								default_args = {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    "owner": "user_name",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    "email": ["username@org .com"],
							 
						 
					
						
							
								
									
										
										
										
											2021-11-13 23:03:20 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    "email_on_failure": False,
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    "retries": 3,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    "retry_delay": timedelta(minutes=5),
							 
						 
					
						
							
								
									
										
										
										
											2021-11-13 23:03:20 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    "execution_timeout": timedelta(minutes=60)
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-10-15 03:51:13 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								config = """
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								{
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								  "source": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    "type": "sample-data",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    "config": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								      "sample_data_folder": "./examples/sample_data"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								  "sink": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    "type": "metadata-rest",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    "config": {}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								  "metadata_server": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    "type": "metadata-server",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    "config": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								      "api_endpoint": "http://localhost:8585/api",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								      "auth_provider_type": "no-auth"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								  }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								"""
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								def metadata_ingestion_workflow():
							 
						 
					
						
							
								
									
										
										
										
											2021-10-15 03:51:13 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    workflow_config = json.loads(config)
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    workflow = Workflow.create(workflow_config)
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    workflow.execute()
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    workflow.raise_from_status()
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    workflow.print_status()
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    workflow.stop()
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								with DAG(
							 
						 
					
						
							
								
									
										
										
										
											2021-10-15 03:51:13 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    "sample_data",
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    default_args=default_args,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    description="An example DAG which runs a OpenMetadata ingestion workflow",
							 
						 
					
						
							
								
									
										
										
										
											2021-10-15 03:51:13 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    start_date=days_ago(1),
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    is_paused_upon_creation=False,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    schedule_interval='*/5 * *  * * ', 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    catchup=False,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								) as dag:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    ingest_task = PythonOperator(
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								        task_id="ingest_using_recipe",
							 
						 
					
						
							
								
									
										
										
										
											2021-11-13 23:03:20 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								        python_callable=metadata_ingestion_workflow,
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    )
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								we are using a python method like below
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 20:13:00 +00:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								```python
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								def metadata_ingestion_workflow():
							 
						 
					
						
							
								
									
										
										
										
											2021-10-15 03:51:13 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    workflow_config = json.loads(config)
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    workflow = Workflow.create(workflow_config)
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    workflow.execute()
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    workflow.raise_from_status()
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    workflow.print_status()
							 
						 
					
						
							
								
									
										
										
										
											2021-10-15 03:51:13 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    workflow.stop
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 10:18:52 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-11-13 23:03:20 +05:30 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								Create a Workflow instance and pass a sample-data configuration which will read metadata from Json files and ingest it into the OpenMetadata Server. You can customize this configuration or add different connectors please refer to our [examples ](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/examples/workflows ) and refer to [Connectors ](../connectors/ ).