From e717d6a93721b5d3f40259eb3376bcb0534dac17 Mon Sep 17 00:00:00 2001 From: varunbharill Date: Mon, 18 Oct 2021 08:48:04 -0700 Subject: [PATCH] feat(bigquery): Ingest lineage metadata from GCP logs. (#3389) --- metadata-ingestion/source_docs/bigquery.md | 50 +++-- .../datahub/ingestion/source/sql/bigquery.py | 212 +++++++++++++++++- .../ingestion/source/usage/bigquery_usage.py | 1 - 3 files changed, 240 insertions(+), 23 deletions(-) diff --git a/metadata-ingestion/source_docs/bigquery.md b/metadata-ingestion/source_docs/bigquery.md index ff360af776..91569e2c30 100644 --- a/metadata-ingestion/source_docs/bigquery.md +++ b/metadata-ingestion/source_docs/bigquery.md @@ -13,6 +13,7 @@ This plugin extracts the following: - Metadata for databases, schemas, and tables - Column types associated with each table - Table, row, and column statistics via optional [SQL profiling](./sql_profiles.md) +- Table level lineage. :::tip @@ -43,22 +44,36 @@ Note that a `.` is used to denote nested fields in the YAML recipe. As a SQL-based service, the Athena integration is also supported by our SQL profiler. See [here](./sql_profiles.md) for more details on configuration. -| Field | Required | Default | Description | -| --------------------------- | -------- | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `project_id` | | Autodetected | Project ID to ingest from. If not specified, will infer from environment. | -| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. | -| `options.