mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-10-31 10:49:00 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			67 lines
		
	
	
		
			3.9 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			67 lines
		
	
	
		
			3.9 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Setup
 | |
| ---
 | |
| # BigQuery Ingestion Guide: Setup & Prerequisites
 | |
| 
 | |
| To configure ingestion from BigQuery, you'll need a [Service Account](https://cloud.google.com/iam/docs/creating-managing-service-accounts) configured with the proper permission sets and an associated [Service Account Key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
 | |
| 
 | |
| This setup guide will walk you through the steps you'll need to take via your Google Cloud Console.
 | |
| 
 | |
| ## BigQuery Prerequisites
 | |
| 
 | |
| If you do not have an existing Service Account and Service Account Key, please work with your BigQuery Admin to ensure you have the appropriate permissions and/or roles to continue with this setup guide.
 | |
| 
 | |
| When creating and managing new Service Accounts and Service Account Keys, we have found the following permissions and roles to be required:
 | |
| 
 | |
| * Create a Service Account: `iam.serviceAccounts.create` permission
 | |
| * Assign roles to a Service Account: `serviceusage.services.enable` permission
 | |
| * Set permission policy to the project: `resourcemanager.projects.setIamPolicy` permission
 | |
| * Generate Key for Service Account: Service Account Key Admin (`roles/iam.serviceAccountKeyAdmin`) IAM role
 | |
| 
 | |
| :::note
 | |
| Please refer to the BigQuery [Permissions](https://cloud.google.com/iam/docs/permissions-reference) and [IAM Roles](https://cloud.google.com/iam/docs/understanding-roles) references for details
 | |
| :::
 | |
| 
 | |
| ## BigQuery Setup
 | |
| 
 | |
| 1. To set up a new Service Account follow [this guide](https://cloud.google.com/iam/docs/creating-managing-service-accounts)
 | |
| 
 | |
| 2. When you are creating a Service Account, assign the following predefined Roles:
 | |
|    * [BigQuery Job User](https://cloud.google.com/bigquery/docs/access-control#bigquery.jobUser)
 | |
|    * [BigQuery Metadata Viewer](https://cloud.google.com/bigquery/docs/access-control#bigquery.metadataViewer)
 | |
|    * [BigQuery Resource Viewer](https://cloud.google.com/bigquery/docs/access-control#bigquery.resourceViewer) -> This role is for Table-Level Lineage and Usage extraction
 | |
|    * [Logs View Accessor](https://cloud.google.com/logging/docs/access-control#logging.viewAccessor) -> This role is for Table-Level Lineage and Usage extraction
 | |
|    * [BigQuery Data Viewer](https://cloud.google.com/bigquery/docs/access-control#bigquery.dataViewer) -> This role is for Profiling
 | |
|    * [BigQuery Read Session User](https://cloud.google.com/bigquery/docs/access-control#bigquery.readSessionUser) -> This role is for Profiling
 | |
| 
 | |
| :::note
 | |
| You can always add/remove roles to Service Accounts later on. Please refer to the BigQuery [Manage access to projects, folders, and organizations](https://cloud.google.com/iam/docs/granting-changing-revoking-access) guide for more details.
 | |
| :::
 | |
| 
 | |
| 3. To filter projects based on the `project_labels` configuration, first visit [cloudresourcemanager.googleapis.com](https://console.developers.google.com/apis/api/cloudresourcemanager.googleapis.com/overview) and enable the `Cloud Resource Manager API`
 | |
| 
 | |
| 4. Create and download a [Service Account Key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys). We will use this to set up authentication within DataHub.
 | |
| 
 | |
| The key file looks like this:
 | |
| 
 | |
| ```json
 | |
| {
 | |
|    "type": "service_account",
 | |
|    "project_id": "project-id-1234567",
 | |
|    "private_key_id": "d0121d0000882411234e11166c6aaa23ed5d74e0",
 | |
|    "private_key": "-----BEGIN PRIVATE KEY-----\nMIIyourkey\n-----END PRIVATE KEY-----",
 | |
|    "client_email": "test@suppproject-id-1234567.iam.gserviceaccount.com",
 | |
|    "client_id": "113545814931671546333",
 | |
|    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
 | |
|    "token_uri": "https://oauth2.googleapis.com/token",
 | |
|    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
 | |
|    "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/test%suppproject-id-1234567.iam.gserviceaccount.com"
 | |
| }
 | |
| ```
 | |
| 
 | |
| ## Next Steps
 | |
| 
 | |
| Once you've confirmed all of the above in BigQuery, it's time to [move on](configuration.md) to configure the actual ingestion source within the DataHub UI.
 | |
| 
 | |
| 
 | 
