docs(snowflake) Snowflake quick ingestion guide (#6750)

This commit is contained in:
Maggie Hays 2022-12-15 17:01:38 -08:00 committed by GitHub
parent 0215245aa3
commit 2d0188c7ed
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 274 additions and 1 deletions

View File

@ -114,6 +114,13 @@ module.exports = {
"docs/quick-ingestion-guides/bigquery/configuration",
],
},
{
Snowflake: [
"docs/quick-ingestion-guides/snowflake/overview",
"docs/quick-ingestion-guides/snowflake/setup",
"docs/quick-ingestion-guides/snowflake/configuration",
],
},
],
},
],

View File

@ -73,7 +73,7 @@ You can find the following details in your Service Account Key file:
* Client Email
* Client ID
Populate the Secret Fields by selecting the Primary Key and Primary Key ID secrets you created in steps 3 and 4.
Populate the Secret Fields by selecting the Private Key and Private Key ID secrets you created in steps 3 and 4.
<p align="center">
<img width="75%" alt="Fill out the BigQuery Recipe" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/bigquery/bigquery-ingestion-recipe.png"/>

View File

@ -0,0 +1,145 @@
---
title: Configuration
---
# Configuring Your Snowflake Connector to DataHub
Now that you have created a DataHub-specific user with the relevant roles in Snowflake in [the prior step](setup.md), it's now time to set up a connection via the DataHub UI.
## Configure Secrets
1. Within DataHub, navigate to the **Ingestion** tab in the top, right corner of your screen
<p align="center">
<img width="75%" alt="Navigate to the &quot;Ingestion Tab&quot;" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/common/common_ingestion_ingestion_button.png"/>
</p>
:::note
If you do not see the Ingestion tab, please contact your DataHub admin to grant you the correct permissions
:::
2. Navigate to the **Secrets** tab and click **Create new secret**
<p align="center">
<img width="75%" alt="Secrets Tab" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/common/common_ingestion_secrets_tab.png"/>
</p>
3. Create a Password secret
This will securely store your Snowflake password within DataHub
* Enter a name like `SNOWFLAKE_PASSWORD` - we will use this later to refer to the secret
* Enter the password configured for the DataHub user in the previous step
* Optionally add a description
* Click **Create**
<p align="center">
<img width="70%" alt="Snowflake Password Secret" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_password_secret.png"/>
</p>
## Configure Recipe
4. Navigate to the **Sources** tab and click **Create new source**
<p align="center">
<img width="75%" alt="Click &quot;Create new source&quot;" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/common/common_ingestion_click_create_new_source_button.png"/>
</p>
5. Select Snowflake
<p align="center">
<img width="70%" alt="Select Snowflake from the options" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_snowflake_source.png"/>
</p>
6. Fill out the Snowflake Recipe
Enter the Snowflake Account Identifier as **Account ID** field. Account identifier is the part before `.snowflakecomputing.com` in your snowflake host URL:
<p align="center">
<img width="70%" alt="Account Id Field" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_account_id.png"/>
</p>
*Learn more about Snowflake Account Identifiers [here](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html#account-identifiers)*
Add the previously added Password secret to **Password** field:
* Click on the Password input field
* Select `SNOWFLAKE_PASSWORD` secret
<p align="center">
<img width="70%" alt="Password field" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_password_secret_field.png"/>
</p>
Populate the relevant fields using the same **Username**, **Role**, and **Warehouse** you created and/or specified in [Snowflake Prerequisites](setup.md).
<p align="center">
<img width="70%" alt="Warehouse Field" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_warehouse_username_role_fields.png"/>
</p>
7. Click **Test Connection**
This step will ensure you have configured your credentials accurately and confirm you have the required permissions to extract all relevant metadata.
<p align="center">
<img width="75%" alt="Test Snoflake connection" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_test_connection.png"/>
</p>
After you have successfully tested your connection, click **Next**.
## Schedule Execution
Now it's time to schedule a recurring ingestion pipeline to regularly extract metadata from your Snowflake instance.
8. Decide how regularly you want this ingestion to run-- day, month, year, hour, minute, etc. Select from the dropdown
<p align="center">
<img width="75%" alt="schedule selector" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/common/common_ingestion_set_execution_schedule.png"/>
</p>
9. Ensure you've configured your correct timezone
<p align="center">
<img width="75%" alt="timezone_selector" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/common/common_ingestion_set_execution_timezone.png"/>
</p>
10. Click **Next** when you are done
## Finish Up
11. Name your ingestion source, then click **Save and Run**
<p align="center">
<img width="75%" alt="Name your ingestion" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/common/common_ingestion_name_ingestion_source.png"/>
</p>
You will now find your new ingestion source running
<p align="center">
<img width="75%" alt="ingestion_running" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_source_running.png"/>
</p>
## Validate Ingestion Runs
12. View the latest status of ingestion runs on the Ingestion page
<p align="center">
<img width="75%" alt="ingestion succeeded" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_ingestion_succeded.png"/>
</p>
13. Click the plus sign to expand the full list of historical runs and outcomes; click **Details** to see the outcomes of a specific run
<p align="center">
<img width="75%" alt="ingestion_details" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_ingestion_details.png"/>
</p>
14. From the Ingestion Run Details page, pick **View All** to see which entities were ingested
<p align="center">
<img width="75%" alt="ingestion_details_view_all" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_details_view_all.png"/>
</p>
15. Pick an entity from the list to manually validate if it contains the detail you expected
<p align="center">
<img width="75%" alt="ingestion_details_view_all" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_view_ingested_assets.png"/>
</p>
**Congratulations!** You've successfully set up Snowflake as an ingestion source for DataHub!
*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*

View File

@ -0,0 +1,48 @@
---
title: Overview
---
# Snowflake Ingestion Guide: Overview
## What You Will Get Out of This Guide
This guide will help you set up the Snowflake connector to begin ingesting metadata into DataHub.
Upon completing this guide, you will have a recurring ingestion pipeline that will extract metadata from Snowflake and load it into DataHub. This will include to following Snowflake asset types:
* Databases
* Schemas
* Tables
* External Tables
* Views
* Materialized Views
The pipeline will also extract:
* **Usage statistics** to help you understand recent query activity (available if using Snowflake Enterprise edition or above)
* **Table- and Column-level lineage** to automatically define interdependencies between datasets and columns (available if using Snowflake Enterprise edition or above)
* **Table-level profile statistics** to help you understand the shape of the data
:::caution
You will NOT have extracted Stages, Snowpipes, Streams, Tasks, Procedures from Snowflake, as the connector does not support ingesting these assets yet.
:::
### Caveats
By default, DataHub only profiles datasets that have changed in the past 1 day. This can be changed in the YAML editor by setting the value of `profile_if_updated_since_days` to something greater than 1.
Additionally, DataHub only extracts usage and lineage information based on operations performed in the last 1 day. This can be changed by setting a custom value for `start_time` and `end_time` in the YAML editor.
*To learn more about setting these advanced values, check out the [Snowflake Ingestion Source](https://datahubproject.io/docs/generated/ingestion/sources/snowflake/#module-snowflake).*
## Next Steps
If that all sounds like what you're looking for, navigate to the [next page](setup.md), where we'll talk about prerequisites.
## Advanced Guides and Reference
If you want to ingest metadata from Snowflake using the DataHub CLI, check out the following resources:
* Learn about CLI Ingestion in the [Introduction to Metadata Ingestion](../../../metadata-ingestion/README.md)
* [Snowflake Ingestion Source](https://datahubproject.io/docs/generated/ingestion/sources/snowflake/#module-snowflake)
*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*

View File

@ -0,0 +1,73 @@
---
title: Setup
---
# Snowflake Ingestion Guide: Setup & Prerequisites
In order to configure ingestion from Snowflake, you'll first have to ensure you have a Snowflake user with the `ACCOUNTADMIN` role or `MANAGE GRANTS` privilege.
## Snowflake Prerequisites
1. Create a DataHub-specific role by executing the following queries in Snowflake. Replace `<your-warehouse>` with an existing warehouse that you wish to use for DataHub ingestion.
```sql
create or replace role datahub_role;
-- Grant access to a warehouse to run queries to view metadata
grant operate, usage on warehouse "<your-warehouse>" to role datahub_role;
```
Make note of this role and warehouse. You'll need this in the next step.
2. Create a DataHub-specific user by executing the following queries. Replace `<your-password>` with a strong password. Replace `<your-warehouse>` with the same warehouse used above.
```sql
create user datahub_user display_name = 'DataHub' password='<your-password>' default_role = datahub_role default_warehouse = '<your-warehouse>';
-- Grant access to the DataHub role created above
grant role datahub_role to user datahub_user;
```
Make note of the user and its password. You'll need this in the next step.
3. Assign privileges to read metadata about your assets by executing the following queries. Replace `<your-database>` with an existing database. Repeat for all databases from your Snowflake instance that you wish to integrate with DataHub.
```sql
set db_var = '"<your-database>"';
-- Grant access to view database and schema in which your tables/views exist
grant usage on DATABASE identifier($db_var) to role datahub_role;
grant usage on all schemas in database identifier($db_var) to role datahub_role;
grant usage on future schemas in database identifier($db_var) to role datahub_role;
-- Grant Select acccess enable Data Profiling
grant select on all tables in database identifier($db_var) to role datahub_role;
grant select on future tables in database identifier($db_var) to role datahub_role;
grant select on all external tables in database identifier($db_var) to role datahub_role;
grant select on future external tables in database identifier($db_var) to role datahub_role;
grant select on all views in database identifier($db_var) to role datahub_role;
grant select on future views in database identifier($db_var) to role datahub_role;
-- Grant access to view tables and views
grant references on all tables in database identifier($db_var) to role datahub_role;
grant references on future tables in database identifier($db_var) to role datahub_role;
grant references on all external tables in database identifier($db_var) to role datahub_role;
grant references on future external tables in database identifier($db_var) to role datahub_role;
grant references on all views in database identifier($db_var) to role datahub_role;
grant references on future views in database identifier($db_var) to role datahub_role;
```
If you have imported databases in your Snowflake instance that you wish to integrate with DataHub, you'll need to use the below query for them.
```sql
grant IMPORTED PRIVILEGES on database "<your-database>" to role datahub_role;
```
4. Assign privileges to extract lineage and usage statistics from Snowflake by executing the below query.
```sql
grant imported privileges on database snowflake to role datahub_role;
```
## Next Steps
Once you've done all of the above in Snowflake, it's time to [move on](configuration.md) to configuring the actual ingestion source within DataHub.
*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*