mirror of
https://github.com/datahub-project/datahub.git
synced 2025-06-27 05:03:31 +00:00
docs(snowflake) Snowflake quick ingestion guide (#6750)
This commit is contained in:
parent
0215245aa3
commit
2d0188c7ed
@ -114,6 +114,13 @@ module.exports = {
|
||||
"docs/quick-ingestion-guides/bigquery/configuration",
|
||||
],
|
||||
},
|
||||
{
|
||||
Snowflake: [
|
||||
"docs/quick-ingestion-guides/snowflake/overview",
|
||||
"docs/quick-ingestion-guides/snowflake/setup",
|
||||
"docs/quick-ingestion-guides/snowflake/configuration",
|
||||
],
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
|
@ -73,7 +73,7 @@ You can find the following details in your Service Account Key file:
|
||||
* Client Email
|
||||
* Client ID
|
||||
|
||||
Populate the Secret Fields by selecting the Primary Key and Primary Key ID secrets you created in steps 3 and 4.
|
||||
Populate the Secret Fields by selecting the Private Key and Private Key ID secrets you created in steps 3 and 4.
|
||||
|
||||
<p align="center">
|
||||
<img width="75%" alt="Fill out the BigQuery Recipe" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/bigquery/bigquery-ingestion-recipe.png"/>
|
||||
|
145
docs/quick-ingestion-guides/snowflake/configuration.md
Normal file
145
docs/quick-ingestion-guides/snowflake/configuration.md
Normal file
@ -0,0 +1,145 @@
|
||||
---
|
||||
title: Configuration
|
||||
---
|
||||
# Configuring Your Snowflake Connector to DataHub
|
||||
|
||||
Now that you have created a DataHub-specific user with the relevant roles in Snowflake in [the prior step](setup.md), it's now time to set up a connection via the DataHub UI.
|
||||
|
||||
## Configure Secrets
|
||||
|
||||
1. Within DataHub, navigate to the **Ingestion** tab in the top, right corner of your screen
|
||||
|
||||
<p align="center">
|
||||
<img width="75%" alt="Navigate to the "Ingestion Tab"" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/common/common_ingestion_ingestion_button.png"/>
|
||||
</p>
|
||||
|
||||
:::note
|
||||
If you do not see the Ingestion tab, please contact your DataHub admin to grant you the correct permissions
|
||||
:::
|
||||
|
||||
2. Navigate to the **Secrets** tab and click **Create new secret**
|
||||
|
||||
<p align="center">
|
||||
<img width="75%" alt="Secrets Tab" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/common/common_ingestion_secrets_tab.png"/>
|
||||
</p>
|
||||
|
||||
3. Create a Password secret
|
||||
|
||||
This will securely store your Snowflake password within DataHub
|
||||
|
||||
* Enter a name like `SNOWFLAKE_PASSWORD` - we will use this later to refer to the secret
|
||||
* Enter the password configured for the DataHub user in the previous step
|
||||
* Optionally add a description
|
||||
* Click **Create**
|
||||
|
||||
<p align="center">
|
||||
<img width="70%" alt="Snowflake Password Secret" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_password_secret.png"/>
|
||||
</p>
|
||||
|
||||
## Configure Recipe
|
||||
|
||||
4. Navigate to the **Sources** tab and click **Create new source**
|
||||
|
||||
<p align="center">
|
||||
<img width="75%" alt="Click "Create new source"" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/common/common_ingestion_click_create_new_source_button.png"/>
|
||||
</p>
|
||||
|
||||
5. Select Snowflake
|
||||
|
||||
<p align="center">
|
||||
<img width="70%" alt="Select Snowflake from the options" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_snowflake_source.png"/>
|
||||
</p>
|
||||
|
||||
6. Fill out the Snowflake Recipe
|
||||
|
||||
Enter the Snowflake Account Identifier as **Account ID** field. Account identifier is the part before `.snowflakecomputing.com` in your snowflake host URL:
|
||||
|
||||
<p align="center">
|
||||
<img width="70%" alt="Account Id Field" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_account_id.png"/>
|
||||
</p>
|
||||
|
||||
*Learn more about Snowflake Account Identifiers [here](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html#account-identifiers)*
|
||||
|
||||
Add the previously added Password secret to **Password** field:
|
||||
* Click on the Password input field
|
||||
* Select `SNOWFLAKE_PASSWORD` secret
|
||||
|
||||
<p align="center">
|
||||
<img width="70%" alt="Password field" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_password_secret_field.png"/>
|
||||
</p>
|
||||
|
||||
Populate the relevant fields using the same **Username**, **Role**, and **Warehouse** you created and/or specified in [Snowflake Prerequisites](setup.md).
|
||||
|
||||
<p align="center">
|
||||
<img width="70%" alt="Warehouse Field" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_warehouse_username_role_fields.png"/>
|
||||
</p>
|
||||
|
||||
7. Click **Test Connection**
|
||||
|
||||
This step will ensure you have configured your credentials accurately and confirm you have the required permissions to extract all relevant metadata.
|
||||
|
||||
<p align="center">
|
||||
<img width="75%" alt="Test Snoflake connection" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_test_connection.png"/>
|
||||
</p>
|
||||
|
||||
After you have successfully tested your connection, click **Next**.
|
||||
|
||||
## Schedule Execution
|
||||
|
||||
Now it's time to schedule a recurring ingestion pipeline to regularly extract metadata from your Snowflake instance.
|
||||
|
||||
8. Decide how regularly you want this ingestion to run-- day, month, year, hour, minute, etc. Select from the dropdown
|
||||
|
||||
<p align="center">
|
||||
<img width="75%" alt="schedule selector" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/common/common_ingestion_set_execution_schedule.png"/>
|
||||
</p>
|
||||
|
||||
9. Ensure you've configured your correct timezone
|
||||
<p align="center">
|
||||
<img width="75%" alt="timezone_selector" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/common/common_ingestion_set_execution_timezone.png"/>
|
||||
</p>
|
||||
|
||||
10. Click **Next** when you are done
|
||||
|
||||
## Finish Up
|
||||
|
||||
11. Name your ingestion source, then click **Save and Run**
|
||||
<p align="center">
|
||||
<img width="75%" alt="Name your ingestion" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/common/common_ingestion_name_ingestion_source.png"/>
|
||||
</p>
|
||||
|
||||
You will now find your new ingestion source running
|
||||
|
||||
<p align="center">
|
||||
<img width="75%" alt="ingestion_running" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_source_running.png"/>
|
||||
</p>
|
||||
|
||||
## Validate Ingestion Runs
|
||||
|
||||
12. View the latest status of ingestion runs on the Ingestion page
|
||||
|
||||
<p align="center">
|
||||
<img width="75%" alt="ingestion succeeded" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_ingestion_succeded.png"/>
|
||||
</p>
|
||||
|
||||
13. Click the plus sign to expand the full list of historical runs and outcomes; click **Details** to see the outcomes of a specific run
|
||||
|
||||
<p align="center">
|
||||
<img width="75%" alt="ingestion_details" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_ingestion_details.png"/>
|
||||
</p>
|
||||
|
||||
14. From the Ingestion Run Details page, pick **View All** to see which entities were ingested
|
||||
|
||||
<p align="center">
|
||||
<img width="75%" alt="ingestion_details_view_all" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_details_view_all.png"/>
|
||||
</p>
|
||||
|
||||
15. Pick an entity from the list to manually validate if it contains the detail you expected
|
||||
|
||||
<p align="center">
|
||||
<img width="75%" alt="ingestion_details_view_all" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/guides/snowflake/snowflake_ingestion_view_ingested_assets.png"/>
|
||||
</p>
|
||||
|
||||
**Congratulations!** You've successfully set up Snowflake as an ingestion source for DataHub!
|
||||
|
||||
*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*
|
48
docs/quick-ingestion-guides/snowflake/overview.md
Normal file
48
docs/quick-ingestion-guides/snowflake/overview.md
Normal file
@ -0,0 +1,48 @@
|
||||
---
|
||||
title: Overview
|
||||
---
|
||||
# Snowflake Ingestion Guide: Overview
|
||||
|
||||
## What You Will Get Out of This Guide
|
||||
|
||||
This guide will help you set up the Snowflake connector to begin ingesting metadata into DataHub.
|
||||
|
||||
Upon completing this guide, you will have a recurring ingestion pipeline that will extract metadata from Snowflake and load it into DataHub. This will include to following Snowflake asset types:
|
||||
|
||||
* Databases
|
||||
* Schemas
|
||||
* Tables
|
||||
* External Tables
|
||||
* Views
|
||||
* Materialized Views
|
||||
|
||||
The pipeline will also extract:
|
||||
|
||||
* **Usage statistics** to help you understand recent query activity (available if using Snowflake Enterprise edition or above)
|
||||
* **Table- and Column-level lineage** to automatically define interdependencies between datasets and columns (available if using Snowflake Enterprise edition or above)
|
||||
* **Table-level profile statistics** to help you understand the shape of the data
|
||||
|
||||
:::caution
|
||||
You will NOT have extracted Stages, Snowpipes, Streams, Tasks, Procedures from Snowflake, as the connector does not support ingesting these assets yet.
|
||||
:::
|
||||
|
||||
### Caveats
|
||||
|
||||
By default, DataHub only profiles datasets that have changed in the past 1 day. This can be changed in the YAML editor by setting the value of `profile_if_updated_since_days` to something greater than 1.
|
||||
|
||||
Additionally, DataHub only extracts usage and lineage information based on operations performed in the last 1 day. This can be changed by setting a custom value for `start_time` and `end_time` in the YAML editor.
|
||||
|
||||
*To learn more about setting these advanced values, check out the [Snowflake Ingestion Source](https://datahubproject.io/docs/generated/ingestion/sources/snowflake/#module-snowflake).*
|
||||
|
||||
## Next Steps
|
||||
|
||||
If that all sounds like what you're looking for, navigate to the [next page](setup.md), where we'll talk about prerequisites.
|
||||
|
||||
## Advanced Guides and Reference
|
||||
|
||||
If you want to ingest metadata from Snowflake using the DataHub CLI, check out the following resources:
|
||||
|
||||
* Learn about CLI Ingestion in the [Introduction to Metadata Ingestion](../../../metadata-ingestion/README.md)
|
||||
* [Snowflake Ingestion Source](https://datahubproject.io/docs/generated/ingestion/sources/snowflake/#module-snowflake)
|
||||
|
||||
*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*
|
73
docs/quick-ingestion-guides/snowflake/setup.md
Normal file
73
docs/quick-ingestion-guides/snowflake/setup.md
Normal file
@ -0,0 +1,73 @@
|
||||
---
|
||||
title: Setup
|
||||
---
|
||||
# Snowflake Ingestion Guide: Setup & Prerequisites
|
||||
|
||||
In order to configure ingestion from Snowflake, you'll first have to ensure you have a Snowflake user with the `ACCOUNTADMIN` role or `MANAGE GRANTS` privilege.
|
||||
|
||||
## Snowflake Prerequisites
|
||||
|
||||
1. Create a DataHub-specific role by executing the following queries in Snowflake. Replace `<your-warehouse>` with an existing warehouse that you wish to use for DataHub ingestion.
|
||||
|
||||
```sql
|
||||
create or replace role datahub_role;
|
||||
-- Grant access to a warehouse to run queries to view metadata
|
||||
grant operate, usage on warehouse "<your-warehouse>" to role datahub_role;
|
||||
```
|
||||
|
||||
Make note of this role and warehouse. You'll need this in the next step.
|
||||
|
||||
2. Create a DataHub-specific user by executing the following queries. Replace `<your-password>` with a strong password. Replace `<your-warehouse>` with the same warehouse used above.
|
||||
|
||||
```sql
|
||||
create user datahub_user display_name = 'DataHub' password='<your-password>' default_role = datahub_role default_warehouse = '<your-warehouse>';
|
||||
-- Grant access to the DataHub role created above
|
||||
grant role datahub_role to user datahub_user;
|
||||
```
|
||||
|
||||
Make note of the user and its password. You'll need this in the next step.
|
||||
|
||||
3. Assign privileges to read metadata about your assets by executing the following queries. Replace `<your-database>` with an existing database. Repeat for all databases from your Snowflake instance that you wish to integrate with DataHub.
|
||||
|
||||
```sql
|
||||
set db_var = '"<your-database>"';
|
||||
-- Grant access to view database and schema in which your tables/views exist
|
||||
grant usage on DATABASE identifier($db_var) to role datahub_role;
|
||||
grant usage on all schemas in database identifier($db_var) to role datahub_role;
|
||||
grant usage on future schemas in database identifier($db_var) to role datahub_role;
|
||||
|
||||
-- Grant Select acccess enable Data Profiling
|
||||
grant select on all tables in database identifier($db_var) to role datahub_role;
|
||||
grant select on future tables in database identifier($db_var) to role datahub_role;
|
||||
grant select on all external tables in database identifier($db_var) to role datahub_role;
|
||||
grant select on future external tables in database identifier($db_var) to role datahub_role;
|
||||
grant select on all views in database identifier($db_var) to role datahub_role;
|
||||
grant select on future views in database identifier($db_var) to role datahub_role;
|
||||
|
||||
-- Grant access to view tables and views
|
||||
grant references on all tables in database identifier($db_var) to role datahub_role;
|
||||
grant references on future tables in database identifier($db_var) to role datahub_role;
|
||||
grant references on all external tables in database identifier($db_var) to role datahub_role;
|
||||
grant references on future external tables in database identifier($db_var) to role datahub_role;
|
||||
grant references on all views in database identifier($db_var) to role datahub_role;
|
||||
grant references on future views in database identifier($db_var) to role datahub_role;
|
||||
|
||||
```
|
||||
|
||||
If you have imported databases in your Snowflake instance that you wish to integrate with DataHub, you'll need to use the below query for them.
|
||||
|
||||
```sql
|
||||
grant IMPORTED PRIVILEGES on database "<your-database>" to role datahub_role;
|
||||
```
|
||||
|
||||
4. Assign privileges to extract lineage and usage statistics from Snowflake by executing the below query.
|
||||
|
||||
```sql
|
||||
grant imported privileges on database snowflake to role datahub_role;
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
Once you've done all of the above in Snowflake, it's time to [move on](configuration.md) to configuring the actual ingestion source within DataHub.
|
||||
|
||||
*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*
|
Loading…
x
Reference in New Issue
Block a user