mirror of https://github.com/open-metadata/OpenMetadata.git synced 2025-07-19 15:31:59 +00:00

Added s3 and gcs examples for dbt (#10639 )

2023-03-17 12:19:34 +05:30

4.8 KiB

Raw Blame History

title	slug
Ingest dbt UI	/connectors/ingestion/workflows/dbt/ingest-dbt-ui

dbt Workflow UI

Learn how to configure the dbt workflow from the UI to ingest dbt data from your data sources.

UI Configuration

Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add the dbt information.

This will populate the dbt tab from the Table Entity Page.

We can create a workflow that will obtain the dbt information from the dbt files and feed it to OpenMetadata. The dbt Ingestion will be in charge of obtaining this data.

1. Add a dbt Ingestion

From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add dbt Ingestion.

2. Configure the dbt Ingestion

Here you can enter the dbt Ingestion details:

Add dbt Source

dbt sources for manifest.json, catalog.json and run_results.json files can be configured as shown in the UI below. The dbt files are needed to be stored on one of these sources.

Only the manifest.json file is compulsory for dbt ingestion.

AWS S3 Buckets

OpenMetadata connects to the AWS s3 bucket via the credentials provided and scans the AWS s3 buckets for manifest.json, catalog.json and run_results.json files.

The name of the s3 bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files.

If the following represents the url for the folder in which the dbt files are stored s3://bucket-name/main-dir/dbt-files/ enter the values in the dbt Bucket Name and dbt Object Prefix fields as shown in the image below

Google Cloud Storage Buckets

OpenMetadata connects to the GCS bucket via the credentials provided and scans the gcs buckets for manifest.json, catalog.json and run_results.json files.

The name of the GCS bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files.

If the following represents the url for the folder in which the dbt files are stored bucket-name/main-dir/dbt_files enter the values in the dbt Bucket Name and dbt Object Prefix fields as shown in the image below

GCS credentials can be stored in two ways:

Entering the credentials directly into the form

Entering the path of file in which the GCS bucket credentials are stored.

For more information on Google Cloud Storage authentication click here.

Local Storage

Path of the manifest.json, catalog.json and run_results.json files stored in the local system or in the container in which openmetadata server is running can be directly provided.

File Server

File server path of the manifest.json, catalog.json and run_results.json files stored on a file server directly provided.

dbt Cloud

Click on the the link here for getting started with dbt cloud account setup if not done already. OpenMetadata uses dbt cloud APIs to fetch the run artifacts (manifest.json, catalog.json and run_results.json) from the most recent dbt run. The APIs need to be authenticated using an Authentication Token. Follow the link here to generate an authentication token for your dbt cloud account.

3. Schedule and Deploy

After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.

4.8 KiB Raw Blame History