mirror of
https://github.com/datahub-project/datahub.git
synced 2025-11-02 03:39:03 +00:00
docs(ingestion/bigquery): update docs to cover bigquery.tables.getData for use_tables_list_query_v2 parameter (#14728)
This commit is contained in:
parent
bca63189fe
commit
0b5616208f
@ -47,9 +47,20 @@ If you have multiple projects in your BigQuery setup, the role should be granted
|
||||
| `bigquery.jobs.listAll` | List all jobs (queries) submitted by any user. Needs for Lineage extraction. | Lineage Extraction/Usage Extraction | [roles/bigquery.resourceViewer](https://cloud.google.com/bigquery/docs/access-control#bigquery.resourceViewer) |
|
||||
| `logging.logEntries.list` | Fetch log entries for lineage/usage data. Not required if `use_exported_bigquery_audit_metadata` is enabled. | Lineage Extraction/Usage Extraction | [roles/logging.privateLogViewer](https://cloud.google.com/logging/docs/access-control#logging.privateLogViewer) |
|
||||
| `logging.privateLogEntries.list` | Fetch log entries for lineage/usage data. Not required if `use_exported_bigquery_audit_metadata` is enabled. | Lineage Extraction/Usage Extraction | [roles/logging.privateLogViewer](https://cloud.google.com/logging/docs/access-control#logging.privateLogViewer) |
|
||||
| `bigquery.tables.getData` | Access table data to extract storage size, last updated at, data profiles etc. | Profiling | |
|
||||
| `bigquery.tables.getData` | Access table data to extract storage size, last updated at, partition information, data profiles etc. **Required when profiling is enabled or when `use_tables_list_query_v2` is enabled.** This permission is needed to query BigQuery's `__TABLES__` pseudo-table. | Profiling/Enhanced Table Metadata | |
|
||||
| `datacatalog.policyTags.get` | _Optional_ Get policy tags for columns with associated policy tags. This permission is required only if `extract_policy_tags_from_catalog` is enabled. | Policy Tag Extraction | [roles/datacatalog.viewer](https://cloud.google.com/data-catalog/docs/access-control#permissions-and-roles) |
|
||||
|
||||
:::warning Important: bigquery.tables.getData Permission
|
||||
|
||||
The `bigquery.tables.getData` permission is **required** in the following scenarios:
|
||||
|
||||
- When **profiling is enabled** (`profiling.enabled: true`)
|
||||
- When **`use_tables_list_query_v2` is enabled** (for enhanced table metadata extraction)
|
||||
|
||||
Without this permission, you'll encounter errors when the connector tries to access BigQuery's `__TABLES__` pseudo-table for detailed table information including partition data, row counts, and storage metrics.
|
||||
|
||||
:::
|
||||
|
||||
#### Create a service account in the Extractor Project
|
||||
|
||||
1. Setup a ServiceAccount as per [BigQuery docs](https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-console)
|
||||
@ -176,6 +187,12 @@ source:
|
||||
|
||||
### Profiling Details
|
||||
|
||||
:::note Profiling Permission Requirement
|
||||
|
||||
When profiling is enabled, the `bigquery.tables.getData` permission is **required**. This is needed to access detailed table metadata including partition information. See the permissions section above for details.
|
||||
|
||||
:::
|
||||
|
||||
For performance reasons, we only profile the latest partition for partitioned tables and the latest shard for sharded tables.
|
||||
You can set partition explicitly with `partition.partition_datetime` property if you want, though note that partition config will be applied to all partitioned tables.
|
||||
|
||||
|
||||
@ -449,10 +449,12 @@ class BigQuerySchemaGenerator:
|
||||
):
|
||||
yield wu
|
||||
except Exception as e:
|
||||
if self.config.is_profiling_enabled():
|
||||
action_mesage = "Does your service account have bigquery.tables.list, bigquery.routines.get, bigquery.routines.list permission, bigquery.tables.getData permission?"
|
||||
# If configuration indicates we need table data access (for profiling or use_tables_list_query_v2),
|
||||
# include bigquery.tables.getData in the error message since that's likely the missing permission
|
||||
if self.config.have_table_data_read_permission:
|
||||
action_mesage = "Does your service account have bigquery.tables.list, bigquery.routines.get, bigquery.routines.list, bigquery.tables.getData permissions?"
|
||||
else:
|
||||
action_mesage = "Does your service account have bigquery.tables.list, bigquery.routines.get, bigquery.routines.list permission?"
|
||||
action_mesage = "Does your service account have bigquery.tables.list, bigquery.routines.get, bigquery.routines.list permissions?"
|
||||
|
||||
self.report.failure(
|
||||
title="Unable to get tables for dataset",
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user