mirror of
				https://github.com/open-metadata/OpenMetadata.git
				synced 2025-10-26 00:04:52 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			176 lines
		
	
	
		
			5.5 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			176 lines
		
	
	
		
			5.5 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Run the ingestion from GitHub Actions
 | |
| slug: /deployment/ingestion/github-actions
 | |
| ---
 | |
| 
 | |
| # Run the ingestion from GitHub Actions
 | |
| 
 | |
| {% note %}
 | |
| 
 | |
| You can find a fully working demo of this setup [here](https://github.com/open-metadata/openmetadata-demo/tree/main/ingestion-github-actions).
 | |
| 
 | |
| {% /note %}
 | |
| 
 | |
| The process to run the ingestion from GitHub Actions is the same as running it from anywhere else.
 | |
| 1. Get the YAML configuration,
 | |
| 2. Prepare the Python Script
 | |
| 3. Schedule the Ingestion
 | |
| 
 | |
| ## 1. YAML Configuration
 | |
| 
 | |
| For any connector and workflow, you can pick it up from its doc [page](/connectors).
 | |
| 
 | |
| ## 2. Prepare the Python Script
 | |
| 
 | |
| In the GitHub Action we will just be triggering a custom Python script. This script will:
 | |
| 
 | |
| - Load the secrets from environment variables (we don't want any security risks!),
 | |
| - Prepare the Workflow class from the Ingestion Framework that contains all the logic on how to run the metadata ingestion,
 | |
| - Execute the workflow and log the results.
 | |
| 
 | |
| - A simplified version of such script looks like follows:
 | |
| 
 | |
| ```python
 | |
| import os
 | |
| import yaml
 | |
| 
 | |
| from metadata.ingestion.api.workflow import Workflow
 | |
| 
 | |
| CONFIG = f"""
 | |
| source:
 | |
|   type: snowflake
 | |
|   serviceName: snowflake_from_github_actions
 | |
|   serviceConnection:
 | |
|     config:
 | |
|       type: Snowflake
 | |
|       username: {os.getenv('SNOWFLAKE_USERNAME')}
 | |
| ...
 | |
| """
 | |
| 
 | |
| 
 | |
| def run():
 | |
|     workflow_config = yaml.safe_load(CONFIG)
 | |
|     workflow = Workflow.create(workflow_config)
 | |
|     workflow.execute()
 | |
|     workflow.raise_from_status()
 | |
|     workflow.print_status()
 | |
|     workflow.stop()
 | |
| 
 | |
| 
 | |
| if __name__ == "__main__":
 | |
|     run()
 | |
| ```
 | |
| 
 | |
| Note how we are securing the credentials using environment variables. You will need to create these env vars in your
 | |
| GitHub repository. Follow the GitHub [docs](https://docs.github.com/en/actions/security-guides/encrypted-secrets) for
 | |
| more information on how to create and use Secrets.
 | |
| 
 | |
| In the end, we'll map these secrets to environment variables in the process, that we can pick up with `os.getenv`.
 | |
| 
 | |
| ## 3. Schedule the Ingestion
 | |
| 
 | |
| Now that we have all the ingredients, we just need to build a simple GitHub Actions with the following steps:
 | |
| 
 | |
| - Install Python
 | |
| - Prepare virtual environment with the openmetadata-ingestion package
 | |
| - Run the script!
 | |
| 
 | |
| - It is as simple as this. Internally the function run we created will be sending the results to the OpenMetadata server, so there's nothing else we need to do here.
 | |
| 
 | |
| A first version of the action could be:
 | |
| 
 | |
| ```yaml
 | |
| name: ingest-snowflake
 | |
| on:
 | |
|   # Any expression you'd like here
 | |
|   schedule:
 | |
|     - cron:  '0 */2 * * *'
 | |
|   # If you also want to execute it manually
 | |
|   workflow_dispatch:
 | |
| 
 | |
| permissions:
 | |
|   id-token: write
 | |
|   contents: read
 | |
| 
 | |
| jobs:
 | |
|   ingest:
 | |
|     runs-on: ubuntu-latest
 | |
| 
 | |
|     steps:
 | |
|     # Pick up the repository code, where the script lives
 | |
|     - name: Checkout
 | |
|       uses: actions/checkout@v3
 | |
| 
 | |
|     # Prepare Python in the GitHub Agent
 | |
|     - name: Set up Python 3.9
 | |
|       uses: actions/setup-python@v4
 | |
|       with:
 | |
|         python-version: 3.9
 | |
| 
 | |
|     # Install the dependencies. Make sure that the client version matches the server!
 | |
|     - name: Install Deps
 | |
|       run: |
 | |
|         python -m venv env
 | |
|         source env/bin/activate
 | |
|         pip install "openmetadata-ingestion[snowflake]==1.0.2.0"
 | |
| 
 | |
|     - name: Run Ingestion
 | |
|       run: |
 | |
|         source env/bin/activate
 | |
|         python ingestion-github-actions/snowflake_ingestion.py
 | |
|       # Add the env vars we need to load the snowflake credentials
 | |
|       env:
 | |
|          SNOWFLAKE_USERNAME: ${{ secrets.SNOWFLAKE_USERNAME }}
 | |
|          SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
 | |
|          SNOWFLAKE_WAREHOUSE: ${{ secrets.SNOWFLAKE_WAREHOUSE }}
 | |
|          SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
 | |
|          SBX_JWT: ${{ secrets.SBX_JWT }}
 | |
| ```
 | |
| 
 | |
| ## [Optional] - Getting Alerts in Slack
 | |
| 
 | |
| A very interesting option that GitHub Actions provide is the ability to get alerts in Slack after our action fails.
 | |
| 
 | |
| This can become specially useful if we want to be notified when our metadata ingestion is not working as expected. 
 | |
| We can use the same setup as above with a couple of slight changes:
 | |
| 
 | |
| ```yaml
 | |
|     - name: Run Ingestion
 | |
|       id: ingestion
 | |
|       continue-on-error: true
 | |
|       run: |
 | |
|         source env/bin/activate
 | |
|         python ingestion-github-actions/snowflake_ingestion.py
 | |
|       # Add the env vars we need to load the snowflake credentials
 | |
|       env:
 | |
|          SNOWFLAKE_USERNAME: ${{ secrets.SNOWFLAKE_USERNAME }}
 | |
|          SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
 | |
|          SNOWFLAKE_WAREHOUSE: ${{ secrets.SNOWFLAKE_WAREHOUSE }}
 | |
|          SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
 | |
|          SBX_JWT: ${{ secrets.SBX_JWT }}
 | |
| 
 | |
|     - name: Slack on Failure
 | |
|       if: steps.ingestion.outcome != 'success'
 | |
|       uses: slackapi/slack-github-action@v1.23.0
 | |
|       with:
 | |
|         payload: |
 | |
|           {
 | |
|             "text": "🔥 Metadata ingestion failed! 🔥"
 | |
|           }
 | |
|       env:
 | |
|         SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
 | |
|         SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOK
 | |
| 
 | |
|     - name: Force failure
 | |
|       if: steps.ingestion.outcome != 'success'
 | |
|       run: |
 | |
|         exit 1
 | |
| ```
 | |
| 
 | |
| We have:
 | |
| 
 | |
| - Marked the `Run Ingestion` step with a specific `id` and with `continue-on-error: true`. If anything happens, we don't want the action to stop.
 | |
| - We added a step with `slackapi/slack-github-action@v1.23.0`. By passing a Slack Webhook link via a secret, we can send any payload to a 
 | |
| - specific Slack channel. You can find more info on how to set up a Slack Webhook [here](https://api.slack.com/messaging/webhooks).
 | |
| - If our `ingestion` step fails, we still want to mark the action as failed, so we are forcing the failure we skipped before.
 | 
