2023-10-21 16:20:59 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								import FeatureAvailability from '@site/src/components/FeatureAvailability ';
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								import Tabs from '@theme/Tabs ';
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								import TabItem from '@theme/TabItem ';
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# Metadata Ingestion
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-10-21 16:20:59 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< FeatureAvailability / >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								DataHub helps you discover and understand your organization's data by automatically collecting information about your data sources. This process is called **metadata ingestion** , allowing DataHub to automatically pull in:
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								-  **Table and column names** from your databases 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Asset Lineage** showing how information flows between systems 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Usage statistics** revealing which datasets are most popular 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Data quality information** including freshness and completeness 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Business context** like ownership and documentation 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								This makes it simple to connect to popular platforms like Snowflake, BigQuery, dbt, and more, schedule automatic updates, and manage credentials securely.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								## Prerequisites and Permissions
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								To manage metadata ingestion in DataHub, you need appropriate permissions.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Option 1: Admin-Level Access
  
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Users can be granted the following privileges for full administrative access to all ingestion sources:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **`Manage Metadata Ingestion` ** - Provides complete access to create, edit, run, and delete all ingestion sources 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **`Manage Secrets` ** - Allows creation and management of encrypted credentials used in ingestion configurations 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								These privileges can be granted in two ways:
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								1.  **Admin Role Assignment**  - Users assigned to the **Admin Role**  receive these privileges by default 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								2.  **Custom Policy with Platform Privileges**  - Create a [Custom Policy ](authorization/policies.md ) that grants the `Manage Metadata Ingestion`  and `Manage Secrets`  platform privileges to specific users or groups 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/ingestion-privileges.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Option 2: Resource-Specific Policies
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								For more granular control, administrators can create [Custom Policies ](authorization/policies.md ) that apply specifically to **Ingestion Sources** , allowing different users to have different levels of access:
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								-  **View** - View ingestion source configurations and run history 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Edit** - Modify ingestion source configurations 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Delete** - Remove ingestion sources 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Execute** - Run ingestion sources on-demand 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								**Prerequisites:**
							 
						 
					
						
							
								
									
										
										
										
											2023-08-01 21:35:42 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								-  **DataHub Core**: Enable the `VIEW_INGESTION_SOURCE_PRIVILEGES_ENABLED`  feature flag 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **DataHub Cloud**: Work with your customer success team to get the feature enabled 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								:::caution
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								**Important**: Once this feature flag is enabled, any policies that apply to "All" resource types will now include Ingestion Sources, including the default read-only policies. This will make the Ingestion tab visible and potentially actionable depending on the applied privileges. Implement this with care if you have view-only policies that should not expose the Data Sources page.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								:::
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Accessing the Ingestion Interface
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Once you have the appropriate privileges, navigate to the **Ingestion**  tab in DataHub.
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/ingestion-tab.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								On this page, you'll see a list of active **Ingestion Sources** . An Ingestion Source represents a configured connection to an external data system from which DataHub extracts metadata.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								If you're just getting started, you won't have any sources configured. The following sections will guide you through creating your first ingestion source.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								## Creating an Ingestion Source
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Step 1: Select a Data Source
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Begin by clicking ** + Create new source** to start the ingestion source creation process.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/create-new-ingestion-source-button.png" / > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Next, select the type of data source you want to connect. DataHub provides pre-built templates for popular platforms including:
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								-  **Data Warehouses**: Snowflake, BigQuery, Redshift, Databricks 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Databases**: MySQL, PostgreSQL, SQL Server, Oracle 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Business Intelligence**: Looker, Tableau, PowerBI 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Streaming**: Kafka, Pulsar 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **And many more...** 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/select-platform-template.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Select the template that matches your data source. If your specific platform isn't listed, you can choose **Custom**  to configure a source manually, though this requires more technical knowledge.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Step 2: Configure Connection Details
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								After selecting your data source template, you'll be presented with a user-friendly form to configure the connection. The exact fields will vary depending on your chosen platform, but typically include:
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								**Connection Information:**
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								-  Host/server address and port 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Database or project names 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Authentication credentials 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								**Data Selection:**
							 
						 
					
						
							
								
									
										
										
										
											2025-04-16 16:55:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								-  Which databases, schemas, or tables to include 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Filtering options to exclude certain data 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Sampling and profiling settings 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								#### Managing Sensitive Information with Secrets
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								For production environments, sensitive information like passwords and API keys should be stored securely using DataHub's **Secrets**  functionality.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								To create a secret:
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								1.  Navigate to the **Secrets**  tab in the Ingestion interface 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								2.  Click **Create new secret**  
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/create-secret.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								3.  Provide a descriptive name (e.g., `BIGQUERY_PRIVATE_KEY` ) 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								4.  Enter the sensitive value 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								5.  Optionally add a description 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								6.  Click **Create**  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Once created, secrets can be referenced in your ingestion configuration forms using the dropdown menus provided for credential fields.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								>  **Security Note**: Users with the `Manage Secrets` privilege can retrieve plaintext secret values through DataHub's GraphQL API. Ensure secrets are only accessible to trusted administrators.
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Step 3: Test Your Connection
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Before proceeding, it's important to verify that DataHub can successfully connect to your data source. Most ingestion source forms include a **Test Connection**  button that validates:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Network connectivity to your data source 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Authentication credentials 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Required permissions for metadata extraction 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								  < img  width = "75%"  alt = "Test BigQuery connection"  src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/bigquery/bigquery-test-connection.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								If the connection test fails, review your configuration and ensure that:
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								-  Network access is available between DataHub and your data source 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Credentials are correct and have sufficient permissions 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Any firewall rules allow the connection 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Step 4: Schedule Execution (Optional)
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								You can configure automatic execution of your ingestion source on a regular schedule. This ensures your metadata stays up-to-date without manual intervention.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/schedule-ingestion.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								If you prefer to run ingestion manually or on an ad-hoc basis, you can skip the scheduling step entirely.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Step 5: Finish Up and Run
  
						 
					
						
							
								
									
										
										
										
											2023-10-21 16:20:59 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Finally, provide a descriptive name for your ingestion source that will help you and your team identify it later.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								You can also assign **Users**  and/or **Groups**  as owners of this ingestion source. By default, you (the creator) will be assigned as an owner, but you can add additional owners or change this at any time after creation.
							 
						 
					
						
							
								
									
										
										
										
											2023-08-01 21:35:42 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/name-ingestion-source.png" / > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
									
										
										
										
											2023-08-01 21:35:42 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Click **Save and Run**  to create the ingestion source and execute it immediately, or **Save**  to create it without running.
							 
						 
					
						
							
								
									
										
										
										
											2023-08-01 21:35:42 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								#### Advanced Configuration Options
  
						 
					
						
							
								
									
										
										
										
											2023-08-01 21:35:42 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								For users who need additional control, DataHub provides advanced configuration options:
							 
						 
					
						
							
								
									
										
										
										
											2023-08-01 21:35:42 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/custom-ingestion-cli-version.png" / > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
									
										
										
										
											2025-04-16 16:55:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								-  **CLI Version:** Specify a particular version of the D ataHub CLI for ingestion execution 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Environment Variables:** Set custom environment variables for the ingestion process 
						 
					
						
							
								
									
										
										
										
											2023-08-01 21:35:42 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								## Running and Monitoring Ingestion
  
						 
					
						
							
								
									
										
										
										
											2023-08-01 21:35:42 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Executing an Ingestion Source
  
						 
					
						
							
								
									
										
										
										
											2023-08-01 21:35:42 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Once you've created your Ingestion Source, you can run it by clicking the 'Play' button. Shortly after, you should see the 'Last Status' column of the ingestion source change to `Running` , indicating that DataHub has successfully queued the ingestion job.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/ingestion/running.png" / > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								When ingestion completes successfully, the status will show as `Success`  in green.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/ingestion/success-run.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Viewing Run History
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								The **Run History**  tab shows you a complete history of all your ingestion runs. Here you can:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **See all runs**: View every ingestion execution across all your sources 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Check recent activity**: Runs are listed with the most recent at the top 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Filter by source**: Use the dropdown to see runs from a specific ingestion source 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Access from Sources tab**: Click on any source's **Last Run**  status or select **View Run History**  from the source menu 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								This makes it easy to track your ingestion performance and troubleshoot any issues over time.
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/ingestion/run-history-tab.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Viewing Ingestion Results
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								After successful ingestion, you can view detailed information about what was extracted:
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								1.  Click the **Success**  status button on a completed ingestion run 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								2.  Select **View All**  to see the list of ingested entities 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								3.  Click on individual entities to validate the extracted metadata 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-07-29 17:08:07 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								  < img  width = "75%"  alt = "ingestion_details_view_all"  src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/ingestion/ingestion-run-summary.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Cancelling Running Ingestion
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								If an ingestion run is taking too long or appears to be stuck, you can cancel it by clicking the 'Stop' button on the running job.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/ingestion/cancelled-run.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								This is useful when encountering issues like:
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								-  Network timeouts 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Ingestion source bugs 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Resource constraints 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								## Troubleshooting Failed Ingestion
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								### Common Failure Reasons
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								When ingestion fails, the most common causes include:
							 
						 
					
						
							
								
									
										
										
										
											2025-04-16 16:55:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/ingestion/failed-source.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								1.  **Configuration Errors** : Incorrect connection details, missing required fields, or invalid parameter values 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								2.  **Authentication Issues** : Wrong credentials, expired tokens, or insufficient permissions 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								3.  **Network Connectivity** : DNS resolution failures, firewall blocks, or unreachable data sources 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								4.  **Secret Resolution Problems** : Referenced secrets that don't exist or have incorrect names 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								5.  **Resource Constraints** : Memory limits, timeouts, or processing capacity issues 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								### Viewing Detailed Logs
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								To diagnose ingestion failures, click on a run history status (Failed, Aborted) value to view and download comprehensive ingestion run logs.
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-07-29 17:08:07 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/ingestion/ingestion-run-log.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								The logs provide detailed information about:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Connection attempts and errors 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Authentication failures 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Data extraction progress 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Error messages and stack traces 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Authentication for Secured DataHub Instances
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								If your DataHub instance has [Metadata Service Authentication ](authentication/introducing-metadata-service-authentication.md ) enabled, you'll need to provide a Personal Access Token in your configuration.
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/ingestion-with-token.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								## Advanced Configuration with YAML
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								While the UI-based forms handle most common ingestion scenarios, advanced users may need direct access to YAML configuration for:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Custom ingestion sources not available in the UI 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Complex transformation pipelines 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Advanced filtering and processing logic 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Integration with external systems 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								For these advanced use cases, DataHub supports direct YAML recipe configuration. For detailed information about YAML-based configuration, including syntax and examples, see the [Recipe Overview Guide ](metadata-ingestion/recipe_overview.md ).
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< Tabs >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   < TabItem  value = "cli"  label = "CLI" > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								You can deploy recipes using the CLI as mentioned in the [CLI documentation for uploading ingestion recipes ](./cli.md#ingest-deploy ).
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```bash
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								datahub ingest deploy --name "My Test Ingestion Source" --schedule "5 * *  * * " --time-zone "UTC" -c recipe.yaml
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   < / TabItem > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   < TabItem  value = "graphql"  label = "GraphQL" > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Create ingestion sources using [DataHub's GraphQL API ](./api/graphql/overview.md ) using the **createIngestionSource**  mutation endpoint.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```graphql
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								mutation {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  createIngestionSource(
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    input: {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								      name: "My Test Ingestion Source"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								      type: "mysql"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								      description: "My ingestion source description"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								      schedule: { interval: "*/5 * *  * * ", timezone: "UTC" }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								      config: {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        recipe: "{\"source\":{\"type\":\"mysql\",\"config\":{\"include_tables\":true,\"database\":null,\"password\":\"${MYSQL_PASSWORD}\",\"profiling\":{\"enabled\":false},\"host_port\":null,\"include_views\":true,\"username\":\"${MYSQL_USERNAME}\"}},\"pipeline_name\":\"urn:li:dataHubIngestionSource:f38bd060-4ea8-459c-8f24-a773286a2927\"}"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        version: "0.8.18"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        executorId: "mytestexecutor"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								      }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  )
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								**Note**: Recipe must be double quotes escaped when using GraphQL
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   < / TabItem > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< / Tabs >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								## Frequently Asked Questions
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								### Why does ingestion fail with 'Failed to Connect' errors in Docker environments?
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								If you're running DataHub using `datahub docker quickstart`  and experiencing connection failures, this may be due to network configuration issues. The ingestion executor might be unable to reach DataHub's backend services.
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Try updating your ingestion configuration to use the Docker internal DNS name:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/quickstart-ingestion-config.png" / > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								< / p >  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### What does a dash mark (-) status mean and how do I fix it?
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								If your ingestion source shows a dash mark (-) status and never changes to 'Running', this could mean:
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								1.  **The source has never been triggered to run**  - Try clicking the "Play" button to execute the source 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								2.  **The DataHub actions executor is not running or healthy**  (DataHub Core users only) 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								If clicking "Play" doesn't resolve the issue, DataHub Core users should diagnose their actions container:
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								1.  Check container status with `docker ps`  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								2.  View executor logs with `docker logs <container-id>`  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								3.  Restart the actions container if necessary 
						 
					
						
							
								
									
										
										
										
											2022-03-07 12:07:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### When should I use CLI/YAML instead of UI ingestion?
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Consider using CLI-based ingestion when:
							 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								-  Your data sources aren't reachable from DataHub's network (use [remote executors ](managed-datahub/operator-guide/setting-up-remote-ingestion-executor.md ) for DataHub Cloud) 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  You need custom ingestion logic not available in UI templates 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Your ingestion requires local file system access 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  You want to distribute ingestion across multiple environments 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  You need complex transformations or custom metadata processing 
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								## Additional Resources
  
						 
					
						
							
								
									
										
										
										
											2022-02-03 12:46:38 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-20 17:28:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								-  **Demo Video**: [Watch a complete UI ingestion walkthrough ](https://www.youtube.com/watch?v=EyMyLcaw_74 ) 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Quick Start Guides**: Step-by-step setup instructions for popular data sources 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Recipe Documentation**: [Comprehensive YAML configuration reference ](metadata-ingestion/recipe_overview.md ) 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  **Integration Catalog**: [Browse all supported data sources and their features ](https://docs.datahub.com/integrations )