# Snowflake To get all metadata from Snowflake you need to use two plugins `snowflake` and `snowflake-usage`. Both of them are described in this page. These will require 2 separate recipes. We understand this is not ideal and we plan to make this easier in the future. For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md). ## `snowflake` ### Setup To install this plugin, run `pip install 'acryl-datahub[snowflake]'`. ### Prerequisites In order to execute this source, your Snowflake user will need to have specific privileges granted to it for reading metadata from your warehouse. You can use the `provision_role` block in the recipe to grant the requires roles. If your system admins prefer running the commands themselves then they can follow this guide to create a DataHub-specific role, assign it the required privileges, and assign it to a new DataHub user by executing the following Snowflake commands from a user with the `ACCOUNTADMIN` role or `MANAGE GRANTS` privilege. ```sql create or replace role datahub_role; // Grant access to a warehouse to run queries to view metadata grant operate, usage on warehouse "" to role datahub_role; // Grant access to view database and schema in which your tables/views exist grant usage on DATABASE "" to role datahub_role; grant usage on all schemas in database "" to role datahub_role; grant usage on future schemas in database "" to role datahub_role; // If you are NOT using Snowflake Profiling feature: Grant references privileges to your tables and views grant references on all tables in database "" to role datahub_role; grant references on future tables in database "" to role datahub_role; grant references on all external tables in database "" to role datahub_role; grant references on future external tables in database "" to role datahub_role; grant references on all views in database "" to role datahub_role; grant references on future views in database "" to role datahub_role; // If you ARE using Snowflake Profiling feature: Grant select privileges to your tables and views grant select on all tables in database "" to role datahub_role; grant select on future tables in database "" to role datahub_role; grant select on all external tables in database "" to role datahub_role; grant select on future external tables in database "" to role datahub_role; grant select on all views in database "" to role datahub_role; grant select on future views in database "" to role datahub_role; // Create a new DataHub user and assign the DataHub role to it create user datahub_user display_name = 'DataHub' password='' default_role = datahub_role default_warehouse = ''; // Grant the datahub_role to the new DataHub user. grant role datahub_role to user datahub_user; ``` The details of each granted privilege can be viewed in [snowflake docs](https://docs.snowflake.com/en/user-guide/security-access-control-privileges.html). A summarization of each privilege, and why it is required for this connector: - `operate` is required on warehouse to execute queries - `usage` is required for us to run queries using the warehouse - `usage` on `database` and `schema` are required because without it tables and views inside them are not accessible. If an admin does the required grants on `table` but misses the grants on `schema` or the `database` in which the table/view exists then we will not be able to get metadata for the table/view. - If metadata is required only on some schemas then you can grant the usage privilieges only on a particular schema like ```sql grant usage on schema ""."" to role datahub_role; ``` - To get the lineage and usage data we need access to the default `snowflake` database This represents the bare minimum privileges required to extract databases, schemas, views, tables from Snowflake. If you plan to enable extraction of table lineage, via the `include_table_lineage` config flag, you'll need to grant additional privileges. See [snowflake usage prerequisites](#prerequisites-1) as the same privilege is required for this purpose too. ### Capabilities This plugin extracts the following: - Metadata for databases, schemas, views and tables - Column types associated with each table - Table, row, and column statistics via optional [SQL profiling](./sql_profiles.md) - Table lineage - On Snowflake standard edition we can get - table -> view lineage - s3 -> table lineage - On Snowflake Enterprise edition in addition to the above from Snowflake Standard edition we can get (Please see [caveats](#caveats-1)) - table -> table lineage - view -> table lineage :::tip You can also get fine-grained usage statistics for Snowflake using the `snowflake-usage` source described [below](#snowflake-usage-plugin). ::: | Capability | Status | Details | |-------------------|--------|------------------------------------------| | Platform Instance | ✔️ | [link](../../docs/platform-instances.md) | | Data Containers | ✔️ | | | Data Domains | ✔️ | [link](../../docs/domains.md) | ### Caveats The [caveats](#caveats-1) mentioned for `snowflake-usage` apply to `snowflake` too. ### Quickstart recipe Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options. For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes). ```yml source: type: snowflake config: provision_role: # Optional enabled: false dry_run: true run_ingestion: false admin_username: "${SNOWFLAKE_ADMIN_USER}" admin_password: "${SNOWFLAKE_ADMIN_PASS}" # Coordinates host_port: account_name warehouse: "COMPUTE_WH" # Credentials username: "${SNOWFLAKE_USER}" password: "${SNOWFLAKE_PASS}" role: "datahub_role" database_pattern: allow: - "^ACCOUNTING_DB$" - "^MARKETING_DB$" schema_pattern: deny: - "information_schema.*" table_pattern: allow: # If you want to ingest only few tables with name revenue and revenue - ".*revenue" - ".*sales" profiling: enabled: true profile_pattern: allow: - 'ACCOUNTING_DB.*.*' - 'MARKETING_DB.*.*' deny: - '.*information_schema.*' sink: # sink configs ``` ### Config details Like all SQL-based sources, the Snowflake integration supports: - Stale Metadata Deletion: See [here](./stateful_ingestion.md) for more details on configuration. - SQL Profiling: See [here](./sql_profiles.md) for more details on configuration. Note that a `.` is used to denote nested fields in the YAML recipe. | Field | Required | Default | Description | |--------------------------------|----------|----------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `authentication_type` | | `"DEFAULT_AUTHENTICATOR"` | The type of authenticator to use when connecting to Snowflake. Supports `"DEFAULT_AUTHENTICATOR"`, `"EXTERNAL_BROWSER_AUTHENTICATOR"` and `"KEY_PAIR_AUTHENTICATOR"`. | | `username` | | | Snowflake username. | | `password` | | | Snowflake password. | | `private_key_path` | | | The path to the private key if using key pair authentication. See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html | | `private_key_password` | | | Password for your private key if using key pair authentication. | | `host_port` | ✅ | | Snowflake host URL. | | `warehouse` | | | Snowflake warehouse. | | `role` | | | Snowflake role. | | `sqlalchemy_uri` | | | URI of database to connect to. See https://docs.sqlalchemy.org/en/14/core/engines.html#database-urls. Takes precedence over other connection parameters. | | `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. | | `platform_instance` | | None | The Platform instance to use while constructing URNs. | | `options.