datahub/metadata-ingestion/README.md

# Introduction to Metadata Ingestion

:::tip Find Integration Source
Please see our **[Integrations page](https://docs.datahub.com/integrations)** to browse our ingestion sources and filter on their features.
:::

## Integration Methods

DataHub offers three methods for data ingestion:

- [UI Ingestion](../docs/ui-ingestion.md) : Easily configure and execute a metadata ingestion pipeline through the UI.
- [CLI Ingestion guide](cli-ingestion.md) : Configure the ingestion pipeline using YAML and execute by it through CLI.
- SDK-based ingestion : Use [Python Emitter](./as-a-library.md) or [Java emitter](../metadata-integration/java/as-a-library.md) to programmatically control the ingestion pipelines.

## Types of Integration

Integration can be divided into two concepts based on the method:

### Push-based Integration

Push-based integrations allow you to emit metadata directly from your data systems when metadata changes.
Examples of push-based integrations include [Airflow](../docs/lineage/airflow.md), [Spark](../metadata-integration/java/acryl-spark-lineage/README.md), [Great Expectations](./integration_docs/great-expectations.md) and [Protobuf Schemas](../metadata-integration/java/datahub-protobuf/README.md). This allows you to get low-latency metadata integration from the "active" agents in your data ecosystem.

### Pull-based Integration

Pull-based integrations allow you to "crawl" or "ingest" metadata from the data systems by connecting to them and extracting metadata in a batch or incremental-batch manner.
Examples of pull-based integrations include BigQuery, Snowflake, Looker, Tableau and many others.

## Core Concepts

The following are the core concepts related to ingestion:

- [Sources](source_overview.md): Data systems from which extract metadata. (e.g. BigQuery, MySQL)
- [Sinks](sink_overview.md): Destination for metadata (e.g. File, DataHub)
- [Recipe](recipe_overview.md): The main configuration for ingestion in the form or .yaml file

For more advanced guides, please refer to the following:

- [Developing on Metadata Ingestion](./developing.md)
- [Adding a Metadata Ingestion Source](./adding-source.md)
- [Using Transformers](./docs/transformer/intro.md)
docs: Ingestion Source Docs Template (#4275) * testing img.shield for status * update to hyperlink * changing link format * adding status options * updating prerequisities and quickstart * update to ingestion docs * updating template with collapse details * adding linebreak between pip install commands * Removed incomplete sentence * typo fix * pushing current changes * testing logos in markdown table * markdown table fix * markdown table fix * adding in additional logos * transposing markdown table * settling on final table format * adding commented-out source template to sidebar.js * moving reference sidebar and adding trailing comma * fixing docs build 2022-03-30 17:36:39 -05:00			`# Introduction to Metadata Ingestion`
Start updating readme 2021-02-12 10:46:28 -08:00
feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`:::tip Find Integration Source`
doc: Acryl to DataHub, datahubproject.io to datahub.com (#13252) Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> 2025-04-28 23:34:33 +09:00			`Please see our [Integrations page](https://docs.datahub.com/integrations) to browse our ingestion sources and filter on their features.`
feat(docs): Updating example files with the new ingestion recipe suffix (#5103) 2022-06-08 00:52:26 +02:00			`:::`

feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`## Integration Methods`
docs(ingest): add details about backwards compatibility guarantees (#7439) 2023-02-28 13:33:58 -08:00
feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`DataHub offers three methods for data ingestion:`
feat(ingest): use plugin system based on Python extras (#2224) 2021-03-11 16:41:05 -05:00
ci(graphql,workflows): Format .md, .graphql, and workflow .yml files via prettier (#13220) 2025-04-16 16:55:51 -07:00			`- [UI Ingestion](../docs/ui-ingestion.md) : Easily configure and execute a metadata ingestion pipeline through the UI.`
			`- [CLI Ingestion guide](cli-ingestion.md) : Configure the ingestion pipeline using YAML and execute by it through CLI.`
			`- SDK-based ingestion : Use [Python Emitter](./as-a-library.md) or [Java emitter](../metadata-integration/java/as-a-library.md) to programmatically control the ingestion pipelines.`
docs(ingest): add details about backwards compatibility guarantees (#7439) 2023-02-28 13:33:58 -08:00
feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`## Types of Integration`
feat(ingest): use plugin system based on Python extras (#2224) 2021-03-11 16:41:05 -05:00
feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`Integration can be divided into two concepts based on the method:`
fix(docs): make intro to metadata ingestion easier for beginners (#4039) * fix(docs): fix sidebar titles for clarity * re-arrange docs to make Intro to Metadata ingestion easier for beginners * minor changes for readability * add heading * docs: add note for common question 2022-02-11 22:33:01 +05:30
feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`### Push-based Integration`
feat(cli): improve error reporting, make sink config optional (#4718) 2022-04-24 17:12:21 -07:00
feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`Push-based integrations allow you to emit metadata directly from your data systems when metadata changes.`
feat(ingest/spark): Promote beta plugin (#10881) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> 2024-07-25 14:46:32 +02:00			`Examples of push-based integrations include [Airflow](../docs/lineage/airflow.md), [Spark](../metadata-integration/java/acryl-spark-lineage/README.md), [Great Expectations](./integration_docs/great-expectations.md) and [Protobuf Schemas](../metadata-integration/java/datahub-protobuf/README.md). This allows you to get low-latency metadata integration from the "active" agents in your data ecosystem.`
fix(docs): make intro to metadata ingestion easier for beginners (#4039) * fix(docs): fix sidebar titles for clarity * re-arrange docs to make Intro to Metadata ingestion easier for beginners * minor changes for readability * add heading * docs: add note for common question 2022-02-11 22:33:01 +05:30
feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`### Pull-based Integration`
feat(ingest): use plugin system based on Python extras (#2224) 2021-03-11 16:41:05 -05:00
ci(graphql,workflows): Format .md, .graphql, and workflow .yml files via prettier (#13220) 2025-04-16 16:55:51 -07:00			`Pull-based integrations allow you to "crawl" or "ingest" metadata from the data systems by connecting to them and extracting metadata in a batch or incremental-batch manner.`
feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`Examples of pull-based integrations include BigQuery, Snowflake, Looker, Tableau and many others.`
feat(ingest): standalone metadata emitters (#2207) 2021-03-10 17:32:12 -05:00
feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`## Core Concepts`
docs(ingest): clarify docs for new ingestion framework (#2108) 2021-02-16 15:31:13 -08:00
feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`The following are the core concepts related to ingestion:`
feat: Adding support for nested schemas in ingestion and visualization (#3079) 2021-08-11 15:47:18 -07:00
feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`- [Sources](source_overview.md): Data systems from which extract metadata. (e.g. BigQuery, MySQL)`
			`- [Sinks](sink_overview.md): Destination for metadata (e.g. File, DataHub)`
			`- [Recipe](recipe_overview.md): The main configuration for ingestion in the form or .yaml file`
docs(ingest): add details about backwards compatibility guarantees (#7439) 2023-02-28 13:33:58 -08:00
feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`For more advanced guides, please refer to the following:`
docs(ingest): add details about backwards compatibility guarantees (#7439) 2023-02-28 13:33:58 -08:00
feat: add ingestion overview pages (#9210) 2023-11-20 18:02:49 +09:00			`- [Developing on Metadata Ingestion](./developing.md)`
			`- [Adding a Metadata Ingestion Source](./adding-source.md)`
			`- [Using Transformers](./docs/transformer/intro.md)`