To get all metadata from BigQuery you need to use two plugins `bigquery` and `bigquery-usage`. Both of them are described in this page. These will require 2 separate recipes. We understand this is not ideal and we plan to make this easier in the future.
1. Setup a ServiceAccount as per [BigQuery docs](https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-console)
| `credential.project_id` | Required if GOOGLE_APPLICATION_CREDENTIALS enviroment variable is not set | | See this [example recipe](https://github.com/datahub-project/datahub/blob/9bdc9af7b90c6a97194eceb898543360b4eb105c/metadata-ingestion/examples/recipes/bigquery_to_datahub.yml#L8) for details |
| `credential.private_key_id` | Required if GOOGLE_APPLICATION_CREDENTIALS enviroment variable is not set | | See this [example recipe](https://github.com/datahub-project/datahub/blob/9bdc9af7b90c6a97194eceb898543360b4eb105c/metadata-ingestion/examples/recipes/bigquery_to_datahub.yml#L8) for details |
| `credential.private_key` | Required if GOOGLE_APPLICATION_CREDENTIALS enviroment variable is not set | | See this [example recipe](https://github.com/datahub-project/datahub/blob/9bdc9af7b90c6a97194eceb898543360b4eb105c/metadata-ingestion/examples/recipes/bigquery_to_datahub.yml#L8) for details |
| `credential.client_email` | Required if GOOGLE_APPLICATION_CREDENTIALS enviroment variable is not set | | See this [example recipe](https://github.com/datahub-project/datahub/blob/9bdc9af7b90c6a97194eceb898543360b4eb105c/metadata-ingestion/examples/recipes/bigquery_to_datahub.yml#L8) for details |
| `credential.client_id` | Required if GOOGLE_APPLICATION_CREDENTIALS enviroment variable is not set | | See this [example recipe](https://github.com/datahub-project/datahub/blob/9bdc9af7b90c6a97194eceb898543360b4eb105c/metadata-ingestion/examples/recipes/bigquery_to_datahub.yml#L8) for details |
| `schema_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
| `view_pattern.allow` | | | List of regex patterns for views to include in ingestion. |
| `view_pattern.deny` | | | List of regex patterns for views to exclude from ingestion. |
| `view_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
| `include_tables` | | `True` | Whether tables should be ingested. |
| `include_views` | | `True` | Whether views should be ingested. |
| `include_table_lineage` | | `True` | Whether table level lineage should be ingested and processed. |
| `max_query_duration` | | `15` | A time buffer in minutes to adjust start_time and end_time while querying Bigquery audit logs. |
| `start_time` | | Start of last full day in UTC (or hour, depending on `bucket_duration`) | Earliest time of lineage data to consider. |
| `end_time` | | End of last full day in UTC (or hour, depending on `bucket_duration`) | Latest time of lineage data to consider. |
| `extra_client_options` | | | Additional options to pass to `google.cloud.logging_v2.client.Client`. |
| `use_exported_bigquery_audit_metadata` | | `False` | When configured, use `BigQueryAuditMetadata` in `bigquery_audit_metadata_datasets` to compute lineage information. |
| `use_date_sharded_audit_log_tables` | | `False` | Whether to read date sharded tables or time partitioned tables when extracting lineage from exported audit logs. |
| `bigquery_audit_metadata_datasets` | | None | A list of datasets that contain a table named `cloudaudit_googleapis_com_data_access` which contain BigQuery audit logs, specifically, those containing `BigQueryAuditMetadata`. It is recommended that the project of the dataset is also specified, for example, `projectA.datasetB`. |
| `domain.domain_key.allow` | | | List of regex patterns for tables/BigQuery dataset to set domain_key domain key (domain_key can be any string like `sales`. There can be multiple domain key specified. |
| `domain.domain_key.deny` | | | List of regex patterns for tables/BigQuery dataset to not assign domain_key. There can be multiple domain key specified. |
| `lineage_client_project_id` | | None | The project to use when creating the BigQuery Client. If left empty, the required `project_id` will be used. This is helpful in case the default project_id is not used for querying. |
| `use_v2_audit_metadata` | | `False` | Whether to use `BigQuery audit logs` to get the lineage or not |
When `use_exported_bigquery_audit_metadata` is set to `true`, lineage information will be computed using exported bigquery logs. On how to setup exported bigquery audit logs, refer to the following [docs](https://cloud.google.com/bigquery/docs/reference/auditlogs#defining_a_bigquery_log_sink_using_gcloud) on BigQuery audit logs. Note that only protoPayloads with "type.googleapis.com/google.cloud.audit.BigQueryAuditMetadata" are supported by the current ingestion version. The `bigquery_audit_metadata_datasets` parameter will be used only if `use_exported_bigquery_audit_metadat` is set to `true`.
Note: the `bigquery_audit_metadata_datasets` parameter receives a list of datasets, in the format $PROJECT.$DATASET. This way queries from a multiple number of projects can be used to compute lineage information.
Note: Since bigquery source also supports dataset level lineage, the auth client will require additional permissions to be able to access the google audit logs. Refer the permissions section in bigquery-usage section below which also accesses the audit logs.
Profiling can profile normal/partitioned and sharded tables as well but due to performance reasons, we only profile the latest partition for Partitioned tables and the latest shard for sharded tables.
If limit/offset parameter is set or partitioning partitioned or sharded table Great Expectation (the profiling framework we use) needs to create temporary
views. By default these views are created in the schema where the profiled table is but you can control to create all these
tables into a predefined schema by setting `profiling.bigquery_temp_table_schema` property.
1. This source only does usage statistics. To get the tables, views, and schemas in your BigQuery project, use the `bigquery` source described above.
2. Depending on the compliance policies setup for the bigquery instance, sometimes logging.read permission is not sufficient. In that case, use either admin or private log viewer permission.
| `credential.project_id` | Required if GOOGLE_APPLICATION_CREDENTIALS enviroment variable is not set | | See this [example recipe](https://github.com/datahub-project/datahub/blob/9bdc9af7b90c6a97194eceb898543360b4eb105c/metadata-ingestion/examples/recipes/bigquery_to_datahub.yml#L8) for details |
| `credential.private_key_id` | Required if GOOGLE_APPLICATION_CREDENTIALS enviroment variable is not set | | See this [example recipe](https://github.com/datahub-project/datahub/blob/9bdc9af7b90c6a97194eceb898543360b4eb105c/metadata-ingestion/examples/recipes/bigquery_to_datahub.yml#L8) for details |
| `credential.private_key` | Required if GOOGLE_APPLICATION_CREDENTIALS enviroment variable is not set | | See this [example recipe](https://github.com/datahub-project/datahub/blob/9bdc9af7b90c6a97194eceb898543360b4eb105c/metadata-ingestion/examples/recipes/bigquery_to_datahub.yml#L8) for details |
| `credential.client_email` | Required if GOOGLE_APPLICATION_CREDENTIALS enviroment variable is not set | | See this [example recipe](https://github.com/datahub-project/datahub/blob/9bdc9af7b90c6a97194eceb898543360b4eb105c/metadata-ingestion/examples/recipes/bigquery_to_datahub.yml#L8) for details |
| `credential.client_id` | Required if GOOGLE_APPLICATION_CREDENTIALS enviroment variable is not set | | See this [example recipe](https://github.com/datahub-project/datahub/blob/9bdc9af7b90c6a97194eceb898543360b4eb105c/metadata-ingestion/examples/recipes/bigquery_to_datahub.yml#L8) for details |
| `extra_client_options` | | | Additional options to pass to `google.cloud.logging_v2.client.Client`. |
| `query_log_delay` | | | To account for the possibility that the query event arrives after the read event in the audit logs, we wait for at least `query_log_delay` additional events to be processed before attempting to resolve BigQuery job information from the logs. If `query_log_delay` is `None`, it gets treated as an unlimited delay, which prioritizes correctness at the expense of memory usage. |
| `max_query_duration` | | `15` | Correction to pad `start_time` and `end_time` with. For handling the case where the read happens within our time range but the query completion event is delayed and happens after the configured end time. |
| `dataset_pattern.allow` | | | List of regex patterns for datasets to include in ingestion. |
| `dataset_pattern.deny` | | | List of regex patterns for datasets to exclude from ingestion. |
| `table_pattern.allow` | | | List of regex patterns for tables to include in ingestion. |
| `table_pattern.deny` | | | List of regex patterns for tables to exclude in ingestion. |
| `user_email_pattern.allow` | | * | List of regex patterns for user emails to include in usage. |
| `user_email_pattern.deny` | | | List of regex patterns for user emails to exclude from usage. |
| `user_email_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
| `use_exported_bigquery_audit_metadata` | | `False` | When configured, use `BigQueryAuditMetadata` in `bigquery_audit_metadata_datasets` to compute usage information. |
| `use_date_sharded_audit_log_tables` | | `False` | Whether to read date sharded tables or time partitioned tables when extracting usage from exported audit logs. |
| `bigquery_audit_metadata_datasets` | | None | A list of datasets that contain a table named `cloudaudit_googleapis_com_data_access` which contain BigQuery audit logs, specifically, those containing `BigQueryAuditMetadata`. It is recommended that the project of the dataset is also specified, for example, `projectA.datasetB`. |
| `use_v2_audit_metadata` | Required if `use_exported_bigquery_audit_metadata` is set to `True`. | `False` | Whether to ingest logs using the v2 format. |
| `format_sql_queries` | | `False` | Whether to format sql queries |
The source was last most recently confirmed compatible with the [December 16, 2021](https://cloud.google.com/bigquery/docs/release-notes#December_16_2021)