mirror of
https://github.com/datahub-project/datahub.git
synced 2025-11-07 06:13:40 +00:00
docs(ingestion/bigquery): update docs to cover bigquery.tables.getData for use_tables_list_query_v2 parameter (#14728)
This commit is contained in:
parent
bca63189fe
commit
0b5616208f
@ -47,9 +47,20 @@ If you have multiple projects in your BigQuery setup, the role should be granted
|
|||||||
| `bigquery.jobs.listAll` | List all jobs (queries) submitted by any user. Needs for Lineage extraction. | Lineage Extraction/Usage Extraction | [roles/bigquery.resourceViewer](https://cloud.google.com/bigquery/docs/access-control#bigquery.resourceViewer) |
|
| `bigquery.jobs.listAll` | List all jobs (queries) submitted by any user. Needs for Lineage extraction. | Lineage Extraction/Usage Extraction | [roles/bigquery.resourceViewer](https://cloud.google.com/bigquery/docs/access-control#bigquery.resourceViewer) |
|
||||||
| `logging.logEntries.list` | Fetch log entries for lineage/usage data. Not required if `use_exported_bigquery_audit_metadata` is enabled. | Lineage Extraction/Usage Extraction | [roles/logging.privateLogViewer](https://cloud.google.com/logging/docs/access-control#logging.privateLogViewer) |
|
| `logging.logEntries.list` | Fetch log entries for lineage/usage data. Not required if `use_exported_bigquery_audit_metadata` is enabled. | Lineage Extraction/Usage Extraction | [roles/logging.privateLogViewer](https://cloud.google.com/logging/docs/access-control#logging.privateLogViewer) |
|
||||||
| `logging.privateLogEntries.list` | Fetch log entries for lineage/usage data. Not required if `use_exported_bigquery_audit_metadata` is enabled. | Lineage Extraction/Usage Extraction | [roles/logging.privateLogViewer](https://cloud.google.com/logging/docs/access-control#logging.privateLogViewer) |
|
| `logging.privateLogEntries.list` | Fetch log entries for lineage/usage data. Not required if `use_exported_bigquery_audit_metadata` is enabled. | Lineage Extraction/Usage Extraction | [roles/logging.privateLogViewer](https://cloud.google.com/logging/docs/access-control#logging.privateLogViewer) |
|
||||||
| `bigquery.tables.getData` | Access table data to extract storage size, last updated at, data profiles etc. | Profiling | |
|
| `bigquery.tables.getData` | Access table data to extract storage size, last updated at, partition information, data profiles etc. **Required when profiling is enabled or when `use_tables_list_query_v2` is enabled.** This permission is needed to query BigQuery's `__TABLES__` pseudo-table. | Profiling/Enhanced Table Metadata | |
|
||||||
| `datacatalog.policyTags.get` | _Optional_ Get policy tags for columns with associated policy tags. This permission is required only if `extract_policy_tags_from_catalog` is enabled. | Policy Tag Extraction | [roles/datacatalog.viewer](https://cloud.google.com/data-catalog/docs/access-control#permissions-and-roles) |
|
| `datacatalog.policyTags.get` | _Optional_ Get policy tags for columns with associated policy tags. This permission is required only if `extract_policy_tags_from_catalog` is enabled. | Policy Tag Extraction | [roles/datacatalog.viewer](https://cloud.google.com/data-catalog/docs/access-control#permissions-and-roles) |
|
||||||
|
|
||||||
|
:::warning Important: bigquery.tables.getData Permission
|
||||||
|
|
||||||
|
The `bigquery.tables.getData` permission is **required** in the following scenarios:
|
||||||
|
|
||||||
|
- When **profiling is enabled** (`profiling.enabled: true`)
|
||||||
|
- When **`use_tables_list_query_v2` is enabled** (for enhanced table metadata extraction)
|
||||||
|
|
||||||
|
Without this permission, you'll encounter errors when the connector tries to access BigQuery's `__TABLES__` pseudo-table for detailed table information including partition data, row counts, and storage metrics.
|
||||||
|
|
||||||
|
:::
|
||||||
|
|
||||||
#### Create a service account in the Extractor Project
|
#### Create a service account in the Extractor Project
|
||||||
|
|
||||||
1. Setup a ServiceAccount as per [BigQuery docs](https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-console)
|
1. Setup a ServiceAccount as per [BigQuery docs](https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-console)
|
||||||
@ -176,6 +187,12 @@ source:
|
|||||||
|
|
||||||
### Profiling Details
|
### Profiling Details
|
||||||
|
|
||||||
|
:::note Profiling Permission Requirement
|
||||||
|
|
||||||
|
When profiling is enabled, the `bigquery.tables.getData` permission is **required**. This is needed to access detailed table metadata including partition information. See the permissions section above for details.
|
||||||
|
|
||||||
|
:::
|
||||||
|
|
||||||
For performance reasons, we only profile the latest partition for partitioned tables and the latest shard for sharded tables.
|
For performance reasons, we only profile the latest partition for partitioned tables and the latest shard for sharded tables.
|
||||||
You can set partition explicitly with `partition.partition_datetime` property if you want, though note that partition config will be applied to all partitioned tables.
|
You can set partition explicitly with `partition.partition_datetime` property if you want, though note that partition config will be applied to all partitioned tables.
|
||||||
|
|
||||||
|
|||||||
@ -449,10 +449,12 @@ class BigQuerySchemaGenerator:
|
|||||||
):
|
):
|
||||||
yield wu
|
yield wu
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
if self.config.is_profiling_enabled():
|
# If configuration indicates we need table data access (for profiling or use_tables_list_query_v2),
|
||||||
action_mesage = "Does your service account have bigquery.tables.list, bigquery.routines.get, bigquery.routines.list permission, bigquery.tables.getData permission?"
|
# include bigquery.tables.getData in the error message since that's likely the missing permission
|
||||||
|
if self.config.have_table_data_read_permission:
|
||||||
|
action_mesage = "Does your service account have bigquery.tables.list, bigquery.routines.get, bigquery.routines.list, bigquery.tables.getData permissions?"
|
||||||
else:
|
else:
|
||||||
action_mesage = "Does your service account have bigquery.tables.list, bigquery.routines.get, bigquery.routines.list permission?"
|
action_mesage = "Does your service account have bigquery.tables.list, bigquery.routines.get, bigquery.routines.list permissions?"
|
||||||
|
|
||||||
self.report.failure(
|
self.report.failure(
|
||||||
title="Unable to get tables for dataset",
|
title="Unable to get tables for dataset",
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user