docs(snowflake): Adding documentation about required Snowflake Privileges (#3770)

This commit is contained in:
John Joyce 2021-12-19 12:01:53 -08:00 committed by GitHub
parent 2770eb6813
commit 110efa68b9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 83 additions and 34 deletions

View File

@ -88,28 +88,32 @@ Note: Since bigquery source also supports dataset level lineage, the auth client
Coming soon!
## BigQuery Usage Stats
# BigQuery Usage Stats
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
### Setup
## Setup
To install this plugin, run `pip install 'acryl-datahub[bigquery-usage]'`.
### Capabilities
### Prerequisites
This plugin extracts the following:
- Statistics on queries issued and tables and columns accessed (excludes views)
- Aggregation of these statistics into buckets, by day or hour granularity
Note: the client must have one of the following OAuth scopes, and should be authorized on all projects you'd like to ingest usage stats from.
The Google Identity must have one of the following OAuth scopes granted to it:
- https://www.googleapis.com/auth/logging.read
- https://www.googleapis.com/auth/logging.admin
- https://www.googleapis.com/auth/cloud-platform.read-only
- https://www.googleapis.com/auth/cloud-platform
And should be authorized on all projects you'd like to ingest usage stats from.
## Capabilities
This plugin extracts the following:
- Statistics on queries issued and tables and columns accessed (excludes views)
- Aggregation of these statistics into buckets, by day or hour granularity
:::note
1. This source only does usage statistics. To get the tables, views, and schemas in your BigQuery project, use the `bigquery` source described above.
@ -117,7 +121,7 @@ Note: the client must have one of the following OAuth scopes, and should be auth
:::
### Quickstart recipe
## Quickstart recipe
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
@ -139,7 +143,7 @@ sink:
# sink configs
```
### Config details
## Config details
Note that a `.` is used to denote nested fields in the YAML recipe.
@ -159,9 +163,10 @@ By default, we extract usage stats for the last day, with the recommendation tha
| `table_pattern.allow` | | | List of regex patterns for tables to include in ingestion. |
| `table_pattern.deny` | | | List of regex patterns for tables to exclude in ingestion. |
### Compatibility
## Compatibility
Coming soon!
The source was last most recently confirmed compatible with the [December 16, 2021](https://cloud.google.com/bigquery/docs/release-notes#December_16_2021)
release of BigQuery.
## Questions

View File

@ -1,4 +1,3 @@
# Redshift
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
@ -7,13 +6,12 @@ For context on getting started with ingestion, check out our [metadata ingestion
To install this plugin, run `pip install 'acryl-datahub[redshift]'`.
::: Required permissions :::
### Prerequisites
This source needs to access system tables that require `superuser` permission; otherwise, it won't be able to see all schemas/tables.
This source needs to access system tables that require `superuser` permission; otherwise, it won't be able to see all schemas/tables.
To add a superuser or grant superuser permission, please refer to the [Superusers page](https://docs.aws.amazon.com/redshift/latest/dg/r_superusers.html).
To add a superuser or grant superuser permission, please check [this page](https://docs.aws.amazon.com/redshift/latest/dg/r_superusers.html)
If you don't want to grant superuser permission, please ensure the user has SELECT privilege on [`SVV_TABLE_INFO`](https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html) table.
If you are unable to add superuser permissions, please ensure the user has SELECT privilege on [`SVV_TABLE_INFO`](https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html) table.
## Capabilities
@ -110,10 +108,6 @@ Note that a `.` is used to denote nested fields in the YAML recipe.
| `include_copy_lineage` | | `True` | Whether lineage should be collected from copy commands |
| `default_schema` | | `"public"` | The default schema to use if the sql parser fails to parse the schema with `sql_based` lineage collector |
## Compatibility
Coming soon!
## Lineage
There are multiple lineage collector implementations as Redshift does not support table lineage out of the box.
@ -156,7 +150,8 @@ Cons:
# Note
- The redshift stl redshift tables which are used for getting data lineage only retain approximately two to five days of log history. This means you cannot extract lineage from queries issued outside that window.
# Redshift-Usage
# Redshift Usage
This plugin extracts usage statistics for datasets in Amazon Redshift. For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
Note: Usage information is computed by querying the following system tables -
@ -165,10 +160,10 @@ Note: Usage information is computed by querying the following system tables -
3. stl_query
4. svl_user_info
##Setup
## Setup
To install this plugin, run `pip install 'acryl-datahub[redshift-usage]'`.
##Capabilities
## Capabilities
This plugin has the below functionalities -
1. For a specific dataset this plugin ingests the following statistics -
1. top n queries.
@ -176,7 +171,7 @@ This plugin has the below functionalities -
3. usage of each column in the dataset.
2. Aggregation of these statistics into buckets, by day or hour granularity.
## Sample usage recipe
## Quickstart recipe
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
@ -199,7 +194,7 @@ sink:
# sink configs
```
### Config details
## Config details
Note that a `.` is used to denote nested fields in the YAML recipe.
By default, we extract usage stats for the last day, with the recommendation that this source is executed every day.

View File

@ -6,6 +6,43 @@ For context on getting started with ingestion, check out our [metadata ingestion
To install this plugin, run `pip install 'acryl-datahub[snowflake]'`.
### Prerequisites
In order to execute this source, your Snowflake user will need to have specific privileges granted to it for reading metadata
from your warehouse. You can create a DataHub-specific role, assign it the required privileges, and assign it to a new DataHub user
by executing the following Snowflake commands from a user with the `ACCOUNTADMIN` role:
```sql
create or replace role datahub_role;
// Grant privileges to use and select from your target warehouses / dbs / schemas / tables
grant operate, usage on warehouse <your-warehouse> to role datahub_role;
grant usage on <your-database> to role datahub_role;
grant usage on all schemas in database <your-database> to role datahub_role;
grant select on all tables in database <your-database> to role datahub_role;
grant select on all external tables in database <your-database> to role datahub_role;
grant select on all views in database <your-database> to role datahub_role;
// Grant privileges for all future schemas and tables created in a warehouse
grant usage on future schemas in database "<your-database>" to role datahub_role;
grant select on future tables in database "<your-database>" to role datahub_role;
// Create a new DataHub user and assign the DataHub role to it
create user datahub_user display_name = 'DataHub' password='' default_role = datahub_role default_warehouse = '<your-warehouse>';
// Grant the datahub_role to the new DataHub user.
grant role datahub_role to user datahub_user;
```
This represents the bare minimum privileges required to extract databases, schemas, views, tables from Snowflake.
If you plan to enable extraction of table lineage, via the `include_table_lineage` config flag, you'll also need to grant privileges
to access the Snowflake Account Usage views. You can execute the following using the `ACCOUNTADMIN` role to do so:
```sql
grant imported privileges on database snowflake to role datahub_role;
```
## Capabilities
This plugin extracts the following:
@ -87,15 +124,26 @@ Note that a `.` is used to denote nested fields in the YAML recipe.
Table lineage requires Snowflake's [Access History](https://docs.snowflake.com/en/user-guide/access-history.html) feature.
## Snowflake Usage Stats
# Snowflake Usage Stats
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
### Setup
## Setup
To install this plugin, run `pip install 'acryl-datahub[snowflake-usage]'`.
### Capabilities
### Prerequisites
In order to execute the snowflake-usage source, your Snowflake user will need to have specific privileges granted to it. Specifically,
you'll need to grant access to the [Account Usage](https://docs.snowflake.com/en/sql-reference/account-usage.html) system tables, using which the DataHub source extracts information. Assuming
you've followed the steps outlined above to create a DataHub-specific User & Role, you'll simply need to execute the following commands
in Snowflake from a user with the `ACCOUNTADMIN` role:
```sql
grant imported privileges on database snowflake to role datahub_role;
```
## Capabilities
This plugin extracts the following:
@ -112,7 +160,7 @@ This source only does usage statistics. To get the tables, views, and schemas in
:::
### Quickstart recipe
## Quickstart recipe
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
@ -138,7 +186,7 @@ sink:
# sink configs
```
### Config details
## Config details
Snowflake integration also supports prevention of redundant reruns for the same data. See [here](./stateful_ingestion.md) for more details on configuration.
@ -161,7 +209,8 @@ Note that a `.` is used to denote nested fields in the YAML recipe.
| `schema_pattern` | | | Allow/deny patterns for schema in snowflake dataset names. |
| `view_pattern` | | | Allow/deny patterns for views in snowflake dataset names. |
| `table_pattern` | | | Allow/deny patterns for tables in snowflake dataset names. |
### Compatibility
# Compatibility
Coming soon!