docs(snowflake): Adding documentation about required Snowflake Privileges (#3770)

2025-11-03 04:10:43 +00:00 · 2021-12-19 12:01:53 -08:00 · 2021-12-19 12:01:53 -08:00 · 110efa68b9
commit 110efa68b9
parent 2770eb6813
3 changed files with 83 additions and 34 deletions
--- a/metadata-ingestion/source_docs/bigquery.md
+++ b/metadata-ingestion/source_docs/bigquery.md
@ -88,28 +88,32 @@ Note: Since bigquery source also supports dataset level lineage, the auth client

 Coming soon!

-## BigQuery Usage Stats
+# BigQuery Usage Stats

 For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).

-### Setup
+## Setup

 To install this plugin, run `pip install 'acryl-datahub[bigquery-usage]'`.

-### Capabilities
+### Prerequisites

-This plugin extracts the following:
-
- Statistics on queries issued and tables and columns accessed (excludes views)
- Aggregation of these statistics into buckets, by day or hour granularity
-
-Note: the client must have one of the following OAuth scopes, and should be authorized on all projects you'd like to ingest usage stats from.
+The Google Identity must have one of the following OAuth scopes granted to it: 

 - https://www.googleapis.com/auth/logging.read
 - https://www.googleapis.com/auth/logging.admin
 - https://www.googleapis.com/auth/cloud-platform.read-only
 - https://www.googleapis.com/auth/cloud-platform

+And should be authorized on all projects you'd like to ingest usage stats from. 
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Statistics on queries issued and tables and columns accessed (excludes views)
+- Aggregation of these statistics into buckets, by day or hour granularity
+
 :::note

 1. This source only does usage statistics. To get the tables, views, and schemas in your BigQuery project, use the `bigquery` source described above.
@ -117,7 +121,7 @@ Note: the client must have one of the following OAuth scopes, and should be auth

 :::

-### Quickstart recipe
+## Quickstart recipe

 Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.

@ -139,7 +143,7 @@ sink:
  # sink configs
 ```

-### Config details
+## Config details

 Note that a `.` is used to denote nested fields in the YAML recipe.

@ -159,9 +163,10 @@ By default, we extract usage stats for the last day, with the recommendation tha
 | `table_pattern.allow`  |          |                                                                | List of regex patterns for tables to include in ingestion.                                                                                                                                                                                                                                                                                                                             |
 | `table_pattern.deny`  |          |                                                                | List of regex patterns for tables to exclude in ingestion.                                                                                                                                                                                                                                                                                                                              |

-### Compatibility
+## Compatibility

-Coming soon!
+The source was last most recently confirmed compatible with the [December 16, 2021](https://cloud.google.com/bigquery/docs/release-notes#December_16_2021)
+release of BigQuery. 

 ## Questions

--- a/metadata-ingestion/source_docs/redshift.md
+++ b/metadata-ingestion/source_docs/redshift.md
@ -1,4 +1,3 @@
-
 # Redshift

 For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
@ -7,13 +6,12 @@ For context on getting started with ingestion, check out our [metadata ingestion

 To install this plugin, run `pip install 'acryl-datahub[redshift]'`.

-::: Required permissions :::
+### Prerequisites

-This source needs to access system tables that require `superuser` permission; otherwise, it won't be able to see all schemas/tables.
+This source needs to access system tables that require `superuser` permission; otherwise, it won't be able to see all schemas/tables. 
+To add a superuser or grant superuser permission, please refer to the [Superusers page](https://docs.aws.amazon.com/redshift/latest/dg/r_superusers.html).

-To add a superuser or grant superuser permission, please check [this page](https://docs.aws.amazon.com/redshift/latest/dg/r_superusers.html)
-
-If you don't want to grant superuser permission, please ensure the user has SELECT privilege on [`SVV_TABLE_INFO`](https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html) table.
+If you are unable to add superuser permissions, please ensure the user has SELECT privilege on [`SVV_TABLE_INFO`](https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html) table.

 ## Capabilities

@ -110,10 +108,6 @@ Note that a `.` is used to denote nested fields in the YAML recipe.
 | `include_copy_lineage`      |          | `True`             | Whether lineage should be collected from copy commands                                                                                                                                  |
 | `default_schema`            |          | `"public"`         | The default schema to use if the sql parser fails to parse the schema with `sql_based` lineage collector                                                                               |

-## Compatibility
-
-Coming soon!
-
 ## Lineage

 There are multiple lineage collector implementations as Redshift does not support table lineage out of the box.
@ -156,7 +150,8 @@ Cons:
 # Note
 - The redshift stl redshift tables which are used for getting data lineage only retain approximately two to five days of log history. This means you cannot extract lineage from queries issued outside that window.

-# Redshift-Usage
+# Redshift Usage
+
 This plugin extracts usage statistics for datasets in Amazon Redshift. For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).

 Note: Usage information is computed by querying the following system tables - 
@ -165,10 +160,10 @@ Note: Usage information is computed by querying the following system tables -
 3. stl_query
 4. svl_user_info

-##Setup
+## Setup
 To install this plugin, run `pip install 'acryl-datahub[redshift-usage]'`.

-##Capabilities
+## Capabilities
 This plugin has the below functionalities -
 1. For a specific dataset this plugin ingests the following statistics - 
   1. top n queries.
@ -176,7 +171,7 @@ This plugin has the below functionalities -
   3. usage of each column in the dataset.
 2. Aggregation of these statistics into buckets, by day or hour granularity.

-## Sample usage recipe
+## Quickstart recipe

 Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.

@ -199,7 +194,7 @@ sink:
 # sink configs
 ```

-### Config details
+## Config details
 Note that a `.` is used to denote nested fields in the YAML recipe.

 By default, we extract usage stats for the last day, with the recommendation that this source is executed every day.
--- a/metadata-ingestion/source_docs/snowflake.md
+++ b/metadata-ingestion/source_docs/snowflake.md
@ -6,6 +6,43 @@ For context on getting started with ingestion, check out our [metadata ingestion

 To install this plugin, run `pip install 'acryl-datahub[snowflake]'`.

+### Prerequisites
+
+In order to execute this source, your Snowflake user will need to have specific privileges granted to it for reading metadata
+from your warehouse. You can create a DataHub-specific role, assign it the required privileges, and assign it to a new DataHub user 
+by executing the following Snowflake commands from a user with the `ACCOUNTADMIN` role: 
+
+```sql
+create or replace role datahub_role;
+
+// Grant privileges to use and select from your target warehouses / dbs / schemas / tables
+grant operate, usage on warehouse <your-warehouse> to role datahub_role;
+grant usage on <your-database> to role datahub_role;
+grant usage on all schemas in database <your-database> to role datahub_role; 
+grant select on all tables in database <your-database> to role datahub_role; 
+grant select on all external tables in database <your-database> to role datahub_role;
+grant select on all views in database <your-database> to role datahub_role;
+
+// Grant privileges for all future schemas and tables created in a warehouse 
+grant usage on future schemas in database "<your-database>" to role datahub_role;
+grant select on future tables in database "<your-database>" to role datahub_role;
+
+// Create a new DataHub user and assign the DataHub role to it 
+create user datahub_user display_name = 'DataHub' password='' default_role = datahub_role default_warehouse = '<your-warehouse>';
+
+// Grant the datahub_role to the new DataHub user. 
+grant role datahub_role to user datahub_user;
+```
+
+This represents the bare minimum privileges required to extract databases, schemas, views, tables from Snowflake. 
+
+If you plan to enable extraction of table lineage, via the `include_table_lineage` config flag, you'll also need to grant privileges
+to access the Snowflake Account Usage views. You can execute the following using the `ACCOUNTADMIN` role to do so:
+
+```sql
+grant imported privileges on database snowflake to role datahub_role;
+```
+
 ## Capabilities

 This plugin extracts the following:
@ -87,15 +124,26 @@ Note that a `.` is used to denote nested fields in the YAML recipe.

 Table lineage requires Snowflake's [Access History](https://docs.snowflake.com/en/user-guide/access-history.html) feature.

-## Snowflake Usage Stats
+# Snowflake Usage Stats

 For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).

-### Setup
+## Setup

 To install this plugin, run `pip install 'acryl-datahub[snowflake-usage]'`.

-### Capabilities
+### Prerequisites 
+
+In order to execute the snowflake-usage source, your Snowflake user will need to have specific privileges granted to it. Specifically,
+you'll need to grant access to the [Account Usage](https://docs.snowflake.com/en/sql-reference/account-usage.html) system tables, using which the DataHub source extracts information. Assuming
+you've followed the steps outlined above to create a DataHub-specific User & Role, you'll simply need to execute the following commands
+in Snowflake from a user with the `ACCOUNTADMIN` role: 
+
+```sql
+grant imported privileges on database snowflake to role datahub_role;
+```
+
+## Capabilities

 This plugin extracts the following:

@ -112,7 +160,7 @@ This source only does usage statistics. To get the tables, views, and schemas in

 :::

-### Quickstart recipe
+## Quickstart recipe

 Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.

@ -138,7 +186,7 @@ sink:
  # sink configs
 ```

-### Config details
+## Config details

 Snowflake integration also supports prevention of redundant reruns for the same data. See [here](./stateful_ingestion.md) for more details on configuration.

@ -161,7 +209,8 @@ Note that a `.` is used to denote nested fields in the YAML recipe.
 | `schema_pattern`  |          |                                                                     | Allow/deny patterns for schema in snowflake dataset names.      |
 | `view_pattern`     |          |                                                                    | Allow/deny patterns for views in snowflake dataset names.       |
 | `table_pattern`     |          |                                                                   | Allow/deny patterns for tables in snowflake dataset names.       |
-### Compatibility
+
+# Compatibility

 Coming soon!