mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-10-24 15:25:10 +00:00

* Delete old docs and rename the openmetadata-docs-v1 to openmetadata-docs * Delete old docs and rename the openmetadata-docs-v1 to openmetadata-docs * Delete old docs and rename the openmetadata-docs-v1 to openmetadata-docs
268 lines
13 KiB
Markdown
268 lines
13 KiB
Markdown
---
|
||
title: Prefect Integration
|
||
slug: /features/integrations/prefect
|
||
---
|
||
|
||
# Prefect
|
||
This page provides instructions on how to install OpenMetadata and Prefect on your local machine.
|
||
|
||
## Requirements (OS X and Linux)
|
||
Please ensure your host system meets the requirements listed below. Then continue to the procedure for installing OpenMetadata.
|
||
|
||
### OS X and Linux
|
||
|
||
#### **Python (version 3.8.0 or greater)**
|
||
|
||
To check what version of Python you have, please use the following command.
|
||
|
||
```shell
|
||
python3 --version
|
||
```
|
||
|
||
#### **Docker (version 20.10.0 or greater)**
|
||
|
||
[Docker](https://docs.docker.com/get-started/overview/) is an open source platform for developing, shipping, and running applications. It enables you to separate your applications from your infrastructure, so you can deliver software quickly using OS-level virtualization. It helps deliver software in packages called Containers.
|
||
|
||
To check what version of Docker you have, please use the following command.
|
||
|
||
```shell
|
||
docker --version
|
||
```
|
||
|
||
If you need to install Docker, please visit [Get Docker](https://docs.docker.com/get-docker/).
|
||
|
||
**Note**: You must **allocate at least 6GB of memory to Docker** in order to run OpenMetadata. To change the memory allocation for Docker, please visit:
|
||
|
||
Preferences -> Resources -> Advanced
|
||
|
||
**`compose` command for Docker (version v2.1.1 or greater)**
|
||
|
||
The Docker `compose` package enables you to define and run multi-container Docker applications. The `compose` command integrates compose functions into the Docker platform, making them available from the Docker command-line interface (CLI). The Python packages you will install in the procedure below use `compose` to deploy OpenMetadata.
|
||
|
||
#### **MacOS X**: Docker on MacOS X ships with compose already available in the Docker CLI.
|
||
|
||
#### **Linux**: To install compose on Linux systems, please visit the [Docker CLI command documentation](https://docs.docker.com/compose/cli-command/#install-on-linux) and follow the instructions.
|
||
|
||
To verify that the `docker compose` command is installed and accessible on your system, run the following command.
|
||
|
||
```shell
|
||
docker compose version
|
||
```
|
||
|
||
Upon running this command you should see output similar to the following.
|
||
|
||
```shell
|
||
Docker Compose version v2.1.1
|
||
```
|
||
|
||
**Note**: In previous releases of Docker compose functions were delivered with the `docker-compose` tool. OpenMetadata uses Compose V2. Please see the paragraphs above for instructions on installing Compose V2.
|
||
|
||
**Install Docker Compose Version 2.0.0 on Linux**
|
||
|
||
Follow the [instructions here](https://docs.docker.com/compose/cli-command/#install-on-linux) to install docker compose version 2.0.0
|
||
|
||
1. Run the following command to download the current stable release of Docker Compose
|
||
```shell
|
||
DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker}
|
||
mkdir -p $DOCKER_CONFIG/cli-plugins
|
||
curl -SL https://github.com/docker/compose/releases/download/v2.2.3/docker-compose-linux-x86_64 -o $DOCKER_CONFIG/cli-plugins/docker-compose
|
||
```
|
||
|
||
This command installs Compose V2 for the active user under $HOME directory. To install Docker Compose for all users on your system, replace `~/.docker/cli-plugins` with `/usr/local/lib/docker/cli-plugins`.
|
||
|
||
4. Apply executable permissions to the binary
|
||
```shell
|
||
chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose
|
||
```
|
||
|
||
3. Test your installation
|
||
```shell
|
||
docker compose version
|
||
Docker Compose version v2.2.3
|
||
```
|
||
|
||
|
||
### Windows
|
||
|
||
#### **WSL2, Ubuntu 20.04, and Docker for Windows**
|
||
|
||
1. Install [WSL2](https://ubuntu.com/wsl)
|
||
2. Install [Ubuntu 20.04](https://www.microsoft.com/en-us/p/ubuntu-2004-lts/9n6svws3rx71)
|
||
3. Install [Docker for Windows](https://www.docker.com/products/docker-desktop)
|
||
|
||
#### **In the Ubuntu Terminal**
|
||
|
||
```shell
|
||
cd ~
|
||
sudo apt update
|
||
sudo apt upgrade
|
||
sudo apt install python3-pip python3-venv
|
||
```
|
||
|
||
Follow the [instructions](/quick-start/local-deployment).
|
||
|
||
|
||
## Installation Process
|
||
This documentation page will walk you through the process of configuring OpenMetadata and [Prefect 2.0](https://www.prefect.io/guide/blog/introducing-prefect-2-0/). It is intended as a minimal viable setup to get you started using both platforms together. Once you want to move to a production-ready deployment, check the last two sections of this tutorial.
|
||
|
||
### 1. Clone the `prefect-openmetadata` repository
|
||
First, clone the latest version of the [prefect-openmetadata](https://github.com/PrefectHQ/prefect-openmetadata) Prefect Collection.
|
||
|
||
Then, navigate to the directory `openmetadata-docker` containing the `docker-compose.yml` file with the minimal requirements to get started with OpenMetadata.
|
||
|
||
### 2. Start OpenMetadata containers
|
||
You can start the containers with OpenMetadata components using:
|
||
```shell
|
||
docker compose up -d
|
||
```
|
||
|
||
This will create a docker **network** and **containers** with the following services:
|
||
|
||
- `openmetadata_mysql` - Metadata store that serves as a persistence layer holding your metadata.
|
||
- `openmetadata_elasticsearch` - Indexing service to search the metadata catalog.
|
||
- `openmetadata_server` - The OpenMetadata UI and API server allowing you to discover insights and interact with your metadata.
|
||
|
||
Wait a couple of minutes until the setup is finished.
|
||
|
||
To check the status of all services, you may run the `docker compose ps` command to investigate the status of all Docker containers:
|
||
```shell
|
||
NAME COMMAND SERVICE STATUS PORTS
|
||
openmetadata_elasticsearch "/tini -- /usr/local…" elasticsearch running 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp
|
||
openmetadata_mysql "/entrypoint.sh mysq…" mysql running (healthy) 33060-33061/tcp
|
||
openmetadata_server "./openmetadata-star…" openmetadata-server running 0.0.0.0:8585->8585/tcp
|
||
```
|
||
|
||
### 3. Confirm you can access the OpenMetadata UI
|
||
Visit the following URL to confirm that you can access the UI and start exploring OpenMetadata:
|
||
|
||
```shell
|
||
http://localhost:8585
|
||
```
|
||
|
||
You should see a page similar to the following as the landing page for the OpenMetadata UI.
|
||
|
||
{% image
|
||
src="/images/v1.0.0/features/integrations/prefect-omd-ui-landing.png"
|
||
alt="Landing page of OpenMetadata UI"
|
||
/%}
|
||
|
||
### 4. Install `prefect-openmetadata`
|
||
|
||
Before running the commands below to install Python libraries, we recommend creating a **virtual environment** with a Python virtual environment manager such as pipenv, conda or virtualenv.
|
||
|
||
You can install the Prefect OpenMetadata package using a single command:
|
||
|
||
```shell
|
||
pip install prefect-openmetadata
|
||
```
|
||
|
||
This will already include Prefect 2.0 - both the client library, as well as an embedded API server and UI, which can optionally be started using:
|
||
|
||
```shell
|
||
prefect orion start
|
||
```
|
||
|
||
If you navigate to the URL, you’ll be able to access a locally running Prefect Orion UI:
|
||
|
||
```shell
|
||
http://localhost:4200
|
||
```
|
||
|
||
Apart from Prefect, `prefect-openmetadata` comes prepackaged with the `openmetadata-ingestion[docker]` library for metadata ingestion. This library contains everything you need to turn your JSON ingestion specifications into workflows that will:
|
||
|
||
- scan your source systems,
|
||
- figure out which metadata needs to be ingested,
|
||
- load the requested metadata into your OpenMetadata backend.
|
||
|
||
### 5. Prepare your metadata ingestion spec
|
||
If you followed the first step of this tutorial, then you cloned the `prefect-openmetadata` repository. This repository contains a directory **example-data** which you can use to ingest sample data into your `OpenMetadata` backend using Prefect.
|
||
|
||
[This documentation page](https://prefecthq.github.io/prefect-openmetadata/run_ingestion_flow/) contains an example configuration you can use in your flow to ingest that sample data.
|
||
|
||
### 6. Run ingestion workflow locally
|
||
Now you can paste the config from above as a string into your flow definition and run it. [This documentation page](https://prefecthq.github.io/prefect-openmetadata/run_ingestion_flow/) explains in detail how that works.
|
||
In short, we only have to:
|
||
|
||
1. Import the flow function,
|
||
2. Pass the config as a string.
|
||
|
||
You can run the workflow as any Python function. No DAGs and no boilerplate.
|
||
|
||
After running your flow, you should see **new users**, **datasets**, **dashboards**, and other **metadata** in your OpenMetadata UI. Also, **your Prefect UI** will display the workflow run and will show the logs with details on which source system has been scanned and which data has been ingested.
|
||
|
||
If you haven't started the Prefect Orion UI yet, you can do that from your CLI:
|
||
|
||
```shell
|
||
prefect orion start
|
||
```
|
||
|
||
If you navigate to the URL [http://localhost:4200](http://localhost:4200), you’ll be able to:
|
||
|
||
- access a locally running Prefect Orion UI
|
||
- see all previously triggered ingestion workflow runs.
|
||
|
||
**Congratulations** on building your first metadata ingestion workflow with OpenMetadata and Prefect! In the next section, we'll look at how you can run this flow on schedule.
|
||
|
||
### 7. Schedule and deploy your metadata ingestion flows with Prefect
|
||
Ingesting your data via manually executed scripts is great for initial exploration, but in order to build a reliable metadata platform, you need to run those workflows on a regular cadence. That’s where you can leverage Prefect [schedules](https://orion-docs.prefect.io/concepts/schedules/) and [deployments](https://orion-docs.prefect.io/concepts/deployments/).
|
||
|
||
[This documentation page](https://prefecthq.github.io/prefect-openmetadata/schedule_ingestion_flow/) demonstrates how you can configure a DeploymentSpec to deploy your flow and ensure that your metadata gets refreshed on schedule.
|
||
|
||
### 8. Deploy the execution layer to run your flows
|
||
So far, we’ve looked at how you can **create** and **schedule** your workflow; but where does this code actually run? This is a place where the concepts of [storage](https://orion-docs.prefect.io/concepts/storage/), [work queues, and agents](https://orion-docs.prefect.io/concepts/work-queues/) become important. But don’t worry - all you need to know to get started is running one CLI command for each of those concepts.
|
||
|
||
**1) Storage**
|
||
|
||
Storage is used to tell Prefect where your workflow code lives. To configure storage, run:
|
||
|
||
```shell
|
||
prefect storage create
|
||
```
|
||
|
||
The CLI will guide you through the process to select the storage of your choice - to get started you can select the Local Storage and choose some path in your file system. You can then directly select it as your default storage.
|
||
|
||
**2) Work Queue**
|
||
|
||
Work queues collect scheduled runs and agents pick those up from the queue. To create a default work queue, run:
|
||
|
||
```shell
|
||
prefect work-queue create default
|
||
```
|
||
|
||
**3) Agent**
|
||
|
||
Agents are lightweight processes that poll their work queues for scheduled runs and execute workflows on the infrastructure you specified on the `DeploymentSpec`’s `flow_runner`. To create an agent corresponding to the default work queue, run:
|
||
|
||
```shell
|
||
prefect agent start default
|
||
```
|
||
|
||
That’s all you need! Once you have executed those three commands, your scheduled deployments (_such as the one we defined using `ingestion_flow.py` above_) are now scheduled, and Prefect will ensure that your metadata stays up-to-date.
|
||
You can observe the state of your metadata ingestion workflows from the [Prefect Orion UI](https://orion-docs.prefect.io/ui/overview/). The UI will also include detailed logs showing which metadata got updated to ensure your data platform remains healthy and observable.
|
||
|
||
### 9. Using Prefect 2.0 in the Cloud
|
||
If you want to move beyond this local installation, you can deploy Prefect 2.0 to run your OpenMetadata ingestion workflows by:
|
||
|
||
- Self-hosting the orchestration layer - see [list of resources on Prefect Discourse](https://discourse.prefect.io/t/how-to-self-host-prefect-2-0-orchestration-layer-list-of-resources-to-get-started/952), or
|
||
- Signing up for [Prefect Cloud 2.0 - the following page](https://discourse.prefect.io/t/how-to-get-started-with-prefect-cloud-2-0/539) will walk you through the process.
|
||
|
||
For various deployment options of OpenMetadata, check the [Deployment](/deployment) section.
|
||
|
||
### 10. Questions about using OpenMetadata with Prefect
|
||
If you have any questions about configuring Prefect, post your question on [Prefect Discourse](https://discourse.prefect.io/) or in the [Prefect Community Slack](https://www.prefect.io/slack/).
|
||
And if you need support for OpenMetadata, get in touch on [OpenMetadata Slack](https://slack.open-metadata.org/).
|
||
|
||
#### Troubleshooting
|
||
**Could not find a version that satisfied the requirement**
|
||
|
||
```shell
|
||
ERROR: Could not find a version that satisfies the requirement openmetadata-ingestion[docker] (from versions: none)
|
||
ERROR: No matching distribution found for openmetadata-ingestion[docker]
|
||
```
|
||
|
||
If you see the above when attempting to install `prefect-openmetadata`, this can be due to using an older version of Python and pip. Please check the [Requirements](/quick-start/local-deployment) section above and confirm that you have supported versions installed.
|
||
|
||
## Next Steps
|
||
1. Visit the overview page and explore the OpenMetadata UI.
|
||
2. Visit the documentation to see what services you can integrate with OpenMetadata.
|
||
3. Visit the documentation and explore the OpenMetadata APIs. |