diff --git a/metadata-ingestion/docs/sources/bigquery/bigquery_pre.md b/metadata-ingestion/docs/sources/bigquery/bigquery_pre.md index b0146636c7..7855ee6778 100644 --- a/metadata-ingestion/docs/sources/bigquery/bigquery_pre.md +++ b/metadata-ingestion/docs/sources/bigquery/bigquery_pre.md @@ -47,9 +47,20 @@ If you have multiple projects in your BigQuery setup, the role should be granted | `bigquery.jobs.listAll` | List all jobs (queries) submitted by any user. Needs for Lineage extraction. | Lineage Extraction/Usage Extraction | [roles/bigquery.resourceViewer](https://cloud.google.com/bigquery/docs/access-control#bigquery.resourceViewer) | | `logging.logEntries.list` | Fetch log entries for lineage/usage data. Not required if `use_exported_bigquery_audit_metadata` is enabled. | Lineage Extraction/Usage Extraction | [roles/logging.privateLogViewer](https://cloud.google.com/logging/docs/access-control#logging.privateLogViewer) | | `logging.privateLogEntries.list` | Fetch log entries for lineage/usage data. Not required if `use_exported_bigquery_audit_metadata` is enabled. | Lineage Extraction/Usage Extraction | [roles/logging.privateLogViewer](https://cloud.google.com/logging/docs/access-control#logging.privateLogViewer) | -| `bigquery.tables.getData` | Access table data to extract storage size, last updated at, data profiles etc. | Profiling | | +| `bigquery.tables.getData` | Access table data to extract storage size, last updated at, partition information, data profiles etc. **Required when profiling is enabled or when `use_tables_list_query_v2` is enabled.** This permission is needed to query BigQuery's `__TABLES__` pseudo-table. | Profiling/Enhanced Table Metadata | | | `datacatalog.policyTags.get` | _Optional_ Get policy tags for columns with associated policy tags. This permission is required only if `extract_policy_tags_from_catalog` is enabled. | Policy Tag Extraction | [roles/datacatalog.viewer](https://cloud.google.com/data-catalog/docs/access-control#permissions-and-roles) | +:::warning Important: bigquery.tables.getData Permission + +The `bigquery.tables.getData` permission is **required** in the following scenarios: + +- When **profiling is enabled** (`profiling.enabled: true`) +- When **`use_tables_list_query_v2` is enabled** (for enhanced table metadata extraction) + +Without this permission, you'll encounter errors when the connector tries to access BigQuery's `__TABLES__` pseudo-table for detailed table information including partition data, row counts, and storage metrics. + +::: + #### Create a service account in the Extractor Project 1. Setup a ServiceAccount as per [BigQuery docs](https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-console) @@ -176,6 +187,12 @@ source: ### Profiling Details +:::note Profiling Permission Requirement + +When profiling is enabled, the `bigquery.tables.getData` permission is **required**. This is needed to access detailed table metadata including partition information. See the permissions section above for details. + +::: + For performance reasons, we only profile the latest partition for partitioned tables and the latest shard for sharded tables. You can set partition explicitly with `partition.partition_datetime` property if you want, though note that partition config will be applied to all partitioned tables. diff --git a/metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema_gen.py b/metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema_gen.py index 2730ba7fc4..da05aaf5e4 100644 --- a/metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema_gen.py +++ b/metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema_gen.py @@ -449,10 +449,12 @@ class BigQuerySchemaGenerator: ): yield wu except Exception as e: - if self.config.is_profiling_enabled(): - action_mesage = "Does your service account have bigquery.tables.list, bigquery.routines.get, bigquery.routines.list permission, bigquery.tables.getData permission?" + # If configuration indicates we need table data access (for profiling or use_tables_list_query_v2), + # include bigquery.tables.getData in the error message since that's likely the missing permission + if self.config.have_table_data_read_permission: + action_mesage = "Does your service account have bigquery.tables.list, bigquery.routines.get, bigquery.routines.list, bigquery.tables.getData permissions?" else: - action_mesage = "Does your service account have bigquery.tables.list, bigquery.routines.get, bigquery.routines.list permission?" + action_mesage = "Does your service account have bigquery.tables.list, bigquery.routines.get, bigquery.routines.list permissions?" self.report.failure( title="Unable to get tables for dataset",