mirror of
				https://github.com/open-metadata/OpenMetadata.git
				synced 2025-10-31 02:29:03 +00:00 
			
		
		
		
	
		
			
	
	
		
			268 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			268 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|   | --- | |||
|  | title: Prefect Integration | |||
|  | slug: /features/integrations/prefect | |||
|  | --- | |||
|  | 
 | |||
|  | # Prefect
 | |||
|  | This page provides instructions on how to install OpenMetadata and Prefect on your local machine. | |||
|  | 
 | |||
|  | ## Requirements (OS X and Linux)
 | |||
|  | Please ensure your host system meets the requirements listed below. Then continue to the procedure for installing OpenMetadata. | |||
|  | 
 | |||
|  | ### OS X and Linux
 | |||
|  | 
 | |||
|  | #### **Python (version 3.8.0 or greater)**
 | |||
|  | 
 | |||
|  | To check what version of Python you have, please use the following command. | |||
|  | 
 | |||
|  | ```shell | |||
|  | python3 --version | |||
|  | ``` | |||
|  | 
 | |||
|  | #### **Docker (version 20.10.0 or greater)**
 | |||
|  | 
 | |||
|  | [Docker](https://docs.docker.com/get-started/overview/) is an open source platform for developing, shipping, and running applications. It enables you to separate your applications from your infrastructure, so you can deliver software quickly using OS-level virtualization. It helps deliver software in packages called Containers. | |||
|  | 
 | |||
|  | To check what version of Docker you have, please use the following command. | |||
|  | 
 | |||
|  | ```shell | |||
|  | docker --version | |||
|  | ``` | |||
|  | 
 | |||
|  | If you need to install Docker, please visit [Get Docker](https://docs.docker.com/get-docker/). | |||
|  | 
 | |||
|  | **Note**: You must **allocate at least 6GB of memory to Docker** in order to run OpenMetadata. To change the memory allocation for Docker, please visit: | |||
|  | 
 | |||
|  | Preferences -> Resources -> Advanced | |||
|  | 
 | |||
|  | **`compose` command for Docker (version v2.1.1 or greater)** | |||
|  | 
 | |||
|  | The Docker `compose` package enables you to define and run multi-container Docker applications. The `compose` command integrates compose functions into the Docker platform, making them available from the Docker command-line interface (CLI). The Python packages you will install in the procedure below use `compose` to deploy OpenMetadata. | |||
|  | 
 | |||
|  | #### **MacOS X**: Docker on MacOS X ships with compose already available in the Docker CLI.
 | |||
|  | 
 | |||
|  | #### **Linux**: To install compose on Linux systems, please visit the [Docker CLI command documentation](https://docs.docker.com/compose/cli-command/#install-on-linux) and follow the instructions.
 | |||
|  | 
 | |||
|  | To verify that the `docker compose` command is installed and accessible on your system, run the following command. | |||
|  | 
 | |||
|  | ```shell | |||
|  | docker compose version | |||
|  | ``` | |||
|  | 
 | |||
|  | Upon running this command you should see output similar to the following. | |||
|  | 
 | |||
|  | ```shell | |||
|  | Docker Compose version v2.1.1 | |||
|  | ``` | |||
|  | 
 | |||
|  | **Note**: In previous releases of Docker compose functions were delivered with the `docker-compose` tool. OpenMetadata uses Compose V2. Please see the paragraphs above for instructions on installing Compose V2. | |||
|  | 
 | |||
|  | **Install Docker Compose Version 2.0.0 on Linux** | |||
|  | 
 | |||
|  | Follow the [instructions here](https://docs.docker.com/compose/cli-command/#install-on-linux) to install docker compose version 2.0.0 | |||
|  | 
 | |||
|  | 1. Run the following command to download the current stable release of Docker Compose | |||
|  | ```shell | |||
|  | DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker} | |||
|  | mkdir -p $DOCKER_CONFIG/cli-plugins | |||
|  | curl -SL https://github.com/docker/compose/releases/download/v2.2.3/docker-compose-linux-x86_64 -o $DOCKER_CONFIG/cli-plugins/docker-compose | |||
|  | ``` | |||
|  | 
 | |||
|  | This command installs Compose V2 for the active user under $HOME directory. To install Docker Compose for all users on your system, replace `~/.docker/cli-plugins` with `/usr/local/lib/docker/cli-plugins`. | |||
|  | 
 | |||
|  | 4. Apply executable permissions to the binary | |||
|  | ```shell | |||
|  | chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose | |||
|  | ``` | |||
|  | 
 | |||
|  | 3. Test your installation | |||
|  | ```shell | |||
|  | docker compose version | |||
|  | Docker Compose version v2.2.3 | |||
|  | ``` | |||
|  | 
 | |||
|  | 
 | |||
|  | ### Windows
 | |||
|  | 
 | |||
|  | #### **WSL2, Ubuntu 20.04, and Docker for Windows**
 | |||
|  | 
 | |||
|  | 1. Install [WSL2](https://ubuntu.com/wsl) | |||
|  | 2. Install [Ubuntu 20.04](https://www.microsoft.com/en-us/p/ubuntu-2004-lts/9n6svws3rx71) | |||
|  | 3. Install [Docker for Windows](https://www.docker.com/products/docker-desktop) | |||
|  | 
 | |||
|  | #### **In the Ubuntu Terminal**
 | |||
|  | 
 | |||
|  | ```shell | |||
|  | cd ~ | |||
|  | sudo apt update | |||
|  | sudo apt upgrade | |||
|  | sudo apt install python3-pip  python3-venv | |||
|  | ``` | |||
|  | 
 | |||
|  | Follow the [instructions](/quick-start/local-deployment). | |||
|  | 
 | |||
|  | 
 | |||
|  | ## Installation Process
 | |||
|  | This documentation page will walk you through the process of configuring OpenMetadata and [Prefect 2.0](https://www.prefect.io/guide/blog/introducing-prefect-2-0/). It is intended as a minimal viable setup to get you started using both platforms together. Once you want to move to a production-ready deployment, check the last two sections of this tutorial. | |||
|  | 
 | |||
|  | ### 1. Clone the `prefect-openmetadata` repository
 | |||
|  | First, clone the latest version of the [prefect-openmetadata](https://github.com/PrefectHQ/prefect-openmetadata) Prefect Collection. | |||
|  | 
 | |||
|  | Then, navigate to the directory `openmetadata-docker` containing the `docker-compose.yml` file with the minimal requirements to get started with OpenMetadata. | |||
|  | 
 | |||
|  | ### 2. Start OpenMetadata containers
 | |||
|  | You can start the containers with OpenMetadata components using: | |||
|  | ```shell | |||
|  | docker compose up -d | |||
|  | ``` | |||
|  | 
 | |||
|  | This will create a docker **network** and **containers** with the following services: | |||
|  | 
 | |||
|  | - `openmetadata_mysql` - Metadata store that serves as a persistence layer holding your metadata. | |||
|  | - `openmetadata_elasticsearch` - Indexing service to search the metadata catalog. | |||
|  | - `openmetadata_server` - The OpenMetadata UI and API server allowing you to discover insights and interact with your metadata. | |||
|  | 
 | |||
|  | Wait a couple of minutes until the setup is finished. | |||
|  | 
 | |||
|  | To check the status of all services, you may run the `docker compose ps` command to investigate the status of all Docker containers: | |||
|  | ```shell | |||
|  | NAME                         COMMAND                  SERVICE               STATUS              PORTS | |||
|  | openmetadata_elasticsearch   "/tini -- /usr/local…"   elasticsearch         running             0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp | |||
|  | openmetadata_mysql           "/entrypoint.sh mysq…"   mysql                 running (healthy)   33060-33061/tcp | |||
|  | openmetadata_server          "./openmetadata-star…"   openmetadata-server   running             0.0.0.0:8585->8585/tcp | |||
|  | ``` | |||
|  | 
 | |||
|  | ### 3. Confirm you can access the OpenMetadata UI
 | |||
|  | Visit the following URL to confirm that you can access the UI and start exploring OpenMetadata: | |||
|  | 
 | |||
|  | ```shell | |||
|  | http://localhost:8585 | |||
|  | ``` | |||
|  | 
 | |||
|  | You should see a page similar to the following as the landing page for the OpenMetadata UI. | |||
|  | 
 | |||
|  | {% image | |||
|  |    src="/images/v1.3/features/integrations/prefect-omd-ui-landing.png" | |||
|  |    alt="Landing page of OpenMetadata UI" | |||
|  |  /%} | |||
|  | 
 | |||
|  | ### 4. Install `prefect-openmetadata`
 | |||
|  | 
 | |||
|  | Before running the commands below to install Python libraries, we recommend creating a **virtual environment** with a Python virtual environment manager such as pipenv, conda or virtualenv. | |||
|  | 
 | |||
|  | You can install the Prefect OpenMetadata package using a single command: | |||
|  | 
 | |||
|  | ```shell | |||
|  | pip install prefect-openmetadata | |||
|  | ``` | |||
|  | 
 | |||
|  | This will already include Prefect 2.0 - both the client library, as well as an embedded API server and UI, which can optionally be started using: | |||
|  | 
 | |||
|  | ```shell | |||
|  | prefect orion start | |||
|  | ``` | |||
|  | 
 | |||
|  | If you navigate to the URL, you’ll be able to access a locally running Prefect Orion UI: | |||
|  | 
 | |||
|  | ```shell | |||
|  | http://localhost:4200 | |||
|  | ``` | |||
|  | 
 | |||
|  | Apart from Prefect, `prefect-openmetadata` comes prepackaged with the `openmetadata-ingestion[docker]` library for metadata ingestion. This library contains everything you need to turn your JSON ingestion specifications into workflows that will: | |||
|  | 
 | |||
|  | - scan your source systems, | |||
|  | - figure out which metadata needs to be ingested, | |||
|  | - load the requested metadata into your OpenMetadata backend. | |||
|  | 
 | |||
|  | ### 5. Prepare your metadata ingestion spec
 | |||
|  | If you followed the first step of this tutorial, then you cloned the `prefect-openmetadata` repository. This repository contains a directory **example-data** which you can use to ingest sample data into your `OpenMetadata` backend using Prefect. | |||
|  | 
 | |||
|  | [This documentation page](https://prefetchq.github.io/prefect-openmetadata/run_ingestion_flow/) contains an example configuration you can use in your flow to ingest that sample data. | |||
|  | 
 | |||
|  | ### 6. Run ingestion workflow locally
 | |||
|  | Now you can paste the config from above as a string into your flow definition and run it. [This documentation page](https://prefetchq.github.io/prefect-openmetadata/run_ingestion_flow/) explains in detail how that works. | |||
|  | In short, we only have to: | |||
|  | 
 | |||
|  | 1. Import the flow function, | |||
|  | 2. Pass the config as a string. | |||
|  | 
 | |||
|  | You can run the workflow as any Python function. No DAGs and no boilerplate. | |||
|  | 
 | |||
|  | After running your flow, you should see **new users**, **datasets**, **dashboards**, and other **metadata** in your OpenMetadata UI. Also, **your Prefect UI** will display the workflow run and will show the logs with details on which source system has been scanned and which data has been ingested. | |||
|  | 
 | |||
|  | If you haven't started the Prefect Orion UI yet, you can do that from your CLI: | |||
|  | 
 | |||
|  | ```shell | |||
|  | prefect orion start | |||
|  | ``` | |||
|  | 
 | |||
|  | If you navigate to the URL [http://localhost:4200](http://localhost:4200), you’ll be able to: | |||
|  | 
 | |||
|  | - access a locally running Prefect Orion UI | |||
|  | - see all previously triggered ingestion workflow runs. | |||
|  | 
 | |||
|  | **Congratulations** on building your first metadata ingestion workflow with OpenMetadata and Prefect! In the next section, we'll look at how you can run this flow on schedule. | |||
|  | 
 | |||
|  | ### 7. Schedule and deploy your metadata ingestion flows with Prefect
 | |||
|  | Ingesting your data via manually executed scripts is great for initial exploration, but in order to build a reliable metadata platform, you need to run those workflows on a regular cadence. That’s where you can leverage Prefect [schedules](https://orion-docs.prefect.io/concepts/schedules/) and [deployments](https://orion-docs.prefect.io/concepts/deployments/). | |||
|  | 
 | |||
|  | [This documentation page](https://prefetchq.github.io/prefect-openmetadata/schedule_ingestion_flow/) demonstrates how you can configure a DeploymentSpec to deploy your flow and ensure that your metadata gets refreshed on schedule. | |||
|  | 
 | |||
|  | ### 8. Deploy the execution layer to run your flows
 | |||
|  | So far, we’ve looked at how you can **create** and **schedule** your workflow; but where does this code actually run? This is a place where the concepts of [storage](https://orion-docs.prefect.io/concepts/storage/), [work queues, and agents](https://orion-docs.prefect.io/concepts/work-queues/) become important. But don’t worry - all you need to know to get started is running one CLI command for each of those concepts. | |||
|  | 
 | |||
|  | **1) Storage** | |||
|  | 
 | |||
|  | Storage is used to tell Prefect where your workflow code lives. To configure storage, run: | |||
|  | 
 | |||
|  | ```shell | |||
|  | prefect storage create | |||
|  | ``` | |||
|  | 
 | |||
|  | The CLI will guide you through the process to select the storage of your choice - to get started you can select the Local Storage and choose some path in your file system. You can then directly select it as your default storage. | |||
|  | 
 | |||
|  | **2) Work Queue** | |||
|  | 
 | |||
|  | Work queues collect scheduled runs and agents pick those up from the queue. To create a default work queue, run: | |||
|  | 
 | |||
|  | ```shell | |||
|  | prefect work-queue create default | |||
|  | ``` | |||
|  | 
 | |||
|  | **3) Agent** | |||
|  | 
 | |||
|  | Agents are lightweight processes that poll their work queues for scheduled runs and execute workflows on the infrastructure you specified on the `DeploymentSpec`’s `flow_runner`. To create an agent corresponding to the default work queue, run: | |||
|  | 
 | |||
|  | ```shell | |||
|  | prefect agent start default | |||
|  | ``` | |||
|  | 
 | |||
|  | That’s all you need! Once you have executed those three commands, your scheduled deployments (_such as the one we defined using `ingestion_flow.py` above_) are now scheduled, and Prefect will ensure that your metadata stays up-to-date. | |||
|  | You can observe the state of your metadata ingestion workflows from the [Prefect Orion UI](https://orion-docs.prefect.io/ui/overview/). The UI will also include detailed logs showing which metadata got updated to ensure your data platform remains healthy and observable. | |||
|  | 
 | |||
|  | ### 9. Using Prefect 2.0 in the Cloud
 | |||
|  | If you want to move beyond this local installation, you can deploy Prefect 2.0 to run your OpenMetadata ingestion workflows by: | |||
|  | 
 | |||
|  | - Self-hosting the orchestration layer - see [list of resources on Prefect Discourse](https://discourse.prefect.io/t/how-to-self-host-prefect-2-0-orchestration-layer-list-of-resources-to-get-started/952), or | |||
|  | - Signing up for [Prefect Cloud 2.0 - the following page](https://discourse.prefect.io/t/how-to-get-started-with-prefect-cloud-2-0/539) will walk you through the process. | |||
|  | 
 | |||
|  | For various deployment options of OpenMetadata, check the [Deployment](/deployment) section. | |||
|  | 
 | |||
|  | ### 10. Questions about using OpenMetadata with Prefect
 | |||
|  | If you have any questions about configuring Prefect, post your question on [Prefect Discourse](https://discourse.prefect.io/) or in the [Prefect Community Slack](https://www.prefect.io/slack/). | |||
|  | And if you need support for OpenMetadata, get in touch on [OpenMetadata Slack](https://slack.open-metadata.org/). | |||
|  | 
 | |||
|  | #### Troubleshooting
 | |||
|  | **Could not find a version that satisfied the requirement** | |||
|  | 
 | |||
|  | ```shell | |||
|  | ERROR: Could not find a version that satisfies the requirement openmetadata-ingestion[docker] (from versions: none) | |||
|  | ERROR: No matching distribution found for openmetadata-ingestion[docker] | |||
|  | ``` | |||
|  | 
 | |||
|  | If you see the above when attempting to install `prefect-openmetadata`, this can be due to using an older version of Python and pip. Please check the [Requirements](/quick-start/local-deployment) section above and confirm that you have supported versions installed. | |||
|  | 
 | |||
|  | ## Next Steps
 | |||
|  | 1. Visit the  overview page and explore the OpenMetadata UI. | |||
|  | 2. Visit the  documentation to see what services you can integrate with OpenMetadata. | |||
|  | 3. Visit the  documentation and explore the OpenMetadata APIs. |