mirror of
				https://github.com/open-metadata/OpenMetadata.git
				synced 2025-10-26 16:22:09 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			141 lines
		
	
	
		
			4.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			141 lines
		
	
	
		
			4.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Ingestion Framework Deployment
 | |
| slug: /deployment/ingestion
 | |
| ---
 | |
| 
 | |
| # Ingestion Framework Deployment
 | |
| 
 | |
| The Ingestion Framework is the module that takes care of bringing metadata in to OpenMetadata. It is used
 | |
| for any type of workflow that is supported in the platform: Metadata, Lineage, Usage, Profiler, Data Quality,...
 | |
| 
 | |
| ## Manage & Schedule the Ingestion Framework
 | |
| 
 | |
| In this guide, we will present the different alternatives to run and manage your ingestion workflows. There are mainly
 | |
| 2 ways of running the ingestion:
 | |
| 1. Internally, by managing the workflows from OpenMetadata.
 | |
| 2. Externally, by using any other tool capable or running Python code.
 | |
| 
 | |
| ### Option 1 - From OpenMetadata
 | |
| 
 | |
| If you want to learn how to configure your setup to run them from OpenMetadata, follow this guide:
 | |
| 
 | |
| {% inlineCalloutContainer %}
 | |
|   {% inlineCallout
 | |
|     color="violet-70"
 | |
|     icon="10k"
 | |
|     bold="OpenMetadata UI"
 | |
|     href="/deployment/ingestion/openmetadata" %}
 | |
|     Deploy, configure and manage the ingestion workflows directly from the OpenMetadata UI.
 | |
|   {% /inlineCallout %}
 | |
| {% /inlineCalloutContainer %}
 | |
| 
 | |
| ### Option 2 - Externally
 | |
| 
 | |
| If, instead, you want to manage them from any other system, you would need a bit more background:
 | |
| 1. How does the Ingestion Framework work?
 | |
| 2. Ingestion Configuration
 | |
| 
 | |
| ### 1. How does the Ingestion Framework work?
 | |
| 
 | |
| The Ingestion Framework contains all the logic about how to connect to the sources, extract their metadata
 | |
| and send it to the OpenMetadata server. We have built it from scratch with the main idea of making it an independent
 | |
| component that can be run from - literally - anywhere.
 | |
| 
 | |
| In order to install it, you just need to get it from [PyPI](https://pypi.org/project/openmetadata-ingestion/).
 | |
| 
 | |
| We will show further examples later, but a piece of code is the best showcase for its simplicity. In order to run
 | |
| a full ingestion process, you just need to execute a single function:
 | |
| 
 | |
| ```python
 | |
| def run():
 | |
|     workflow_config = yaml.safe_load(CONFIG)
 | |
|     workflow = Workflow.create(workflow_config)
 | |
|     workflow.execute()
 | |
|     workflow.raise_from_status()
 | |
|     workflow.print_status()
 | |
|     workflow.stop()
 | |
| ```
 | |
| 
 | |
| Where this function runs is completely up to you, and you can adapt it to what makes the most sense within your
 | |
| organization and engineering context.
 | |
| 
 | |
| ### 2. Ingestion Configuration
 | |
| 
 | |
| In the example above, the `Workflow` class got created from a YAML configuration. Any Workflow that you execute (ingestion,
 | |
| profiler, lineage,...) will have its own YAML representation.
 | |
| 
 | |
| You can think about this configuration as the recipe you want to execute: where is your source, which pieces do you
 | |
| extract, how are they processed and where are they sent.
 | |
| 
 | |
| An example YAML config for extracting MySQL metadata looks like this:
 | |
| 
 | |
| ```yaml
 | |
| source:
 | |
|   type: mysql
 | |
|   serviceName: mysql
 | |
|   serviceConnection:
 | |
|     config:
 | |
|       type: Mysql
 | |
|       username: openmetadata_user
 | |
|       authType:
 | |
|         password: openmetadata_password
 | |
|       hostPort: localhost:3306
 | |
|       databaseSchema: openmetadata_db
 | |
|   sourceConfig:
 | |
|     config:
 | |
|       type: DatabaseMetadata
 | |
| sink:
 | |
|   type: metadata-rest
 | |
|   config: {}
 | |
| workflowConfig:
 | |
|   openMetadataServerConfig:
 | |
|     hostPort: 'http://localhost:8585/api'
 | |
|     authProvider: openmetadata
 | |
|     securityConfig:
 | |
|       jwtToken: ...
 | |
| ```
 | |
| 
 | |
| If you need to get the YAML shape of any connector, you can pick it up from its doc [page](/connectors).
 | |
| 
 | |
| ### Examples
 | |
| 
 | |
| {% note %}
 | |
| 
 | |
| This is not an exhaustive list, and it will keep growing over time. Not because the orchestrators X or Y are not supported,
 | |
| but just because we did not have the time yet to add it here. If you'd like to chip in and help us expand these guides and examples,
 | |
| don't hesitate to reach to us in [Slack](https://slack.open-metadata.org/) or directly open a PR in
 | |
| [GitHub](https://github.com/open-metadata/OpenMetadata/tree/main/openmetadata-docs/content).
 | |
| 
 | |
| {% /note %}
 | |
| 
 | |
| {% inlineCalloutContainer %}
 | |
|   {% inlineCallout
 | |
|     color="violet-70"
 | |
|     icon="10k"
 | |
|     bold="Airflow"
 | |
|     href="/deployment/ingestion/airflow" %}
 | |
|     Run the ingestion process externally from Airflow
 | |
|   {% /inlineCallout %}
 | |
|   {% inlineCallout
 | |
|     color="violet-70"
 | |
|     icon="10k"
 | |
|     bold="MWAA"
 | |
|     href="/deployment/ingestion/mwaa" %}
 | |
|     Run the ingestion process externally using AWS MWAA
 | |
|   {% /inlineCallout %}
 | |
|   {% inlineCallout
 | |
|     color="violet-70"
 | |
|     icon="10k"
 | |
|     bold="GCS Composer"
 | |
|     href="/deployment/ingestion/gcs-composer" %}
 | |
|     Run the ingestion process externally from GCS Composer
 | |
|   {% /inlineCallout %}
 | |
|   {% inlineCallout
 | |
|     color="violet-70"
 | |
|     icon="10k"
 | |
|     bold="GitHub Actions"
 | |
|     href="/deployment/ingestion/github-actions" %}
 | |
|     Run the ingestion process externally from GitHub Actions
 | |
|   {% /inlineCallout %}
 | |
| {% /inlineCalloutContainer %}
 | 
