mirror of
https://github.com/datahub-project/datahub.git
synced 2025-11-14 10:19:51 +00:00
4.4 KiB
4.4 KiB
Prerequisities
- Get your Databricks instance's workspace url
- Create a Databricks Service Principal
- You can skip this step and use your own account to get things running quickly, but we strongly recommend creating a dedicated service principal for production use.
Authentication Options
You can authenticate with Databricks using either a Personal Access Token or Azure authentication:
Option 1: Personal Access Token (PAT)
- Generate a Databricks Personal Access token following the following guides:
Option 2: Azure Authentication (for Azure Databricks)
- Create an Azure Active Directory application:
- Follow the Azure AD app registration guide
- Note down the
client_id(Application ID),tenant_id(Directory ID), and create aclient_secret
- Grant the Azure AD application access to your Databricks workspace:
- Add the service principal to your Databricks workspace following this guide
Provision your service account:
- To ingest your workspace's metadata and lineage, your service principal must have all of the following:
- One of: metastore admin role, ownership of, or
USE CATALOGprivilege on any catalogs you want to ingest - One of: metastore admin role, ownership of, or
USE SCHEMAprivilege on any schemas you want to ingest - Ownership of or
SELECTprivilege on any tables and views you want to ingest - Ownership documentation
- Privileges documentation
- One of: metastore admin role, ownership of, or
- To ingest legacy hive_metastore catalog (
include_hive_metastore- enabled by default), your service principal must have all of the following:READ_METADATAandUSAGEprivilege onhive_metastorecatalogREAD_METADATAandUSAGEprivilege on schemas you want to ingestREAD_METADATAandUSAGEprivilege on tables and views you want to ingest- Hive Metastore Privileges documentation
- To ingest your workspace's notebooks and respective lineage, your service principal must have
CAN_READprivileges on the folders containing the notebooks you want to ingest: guide. - To
include_usage_statistics(enabled by default), your service principal must have one of the following:CAN_MANAGEpermissions on any SQL Warehouses you want to ingest: guide.- When
usage_data_sourceis set toSYSTEM_TABLESorAUTO(default) withwarehouse_idconfigured:SELECTprivilege onsystem.query.historytable for improved performance with large query volumes and multi-workspace setups.
- To ingest
profilinginformation withmethod: ge, you needSELECTprivileges on all profiled tables. - To ingest
profilinginformation withmethod: analyzeandcall_analyze: true(enabled by default), your service principal must have ownership orMODIFYprivilege on any tables you want to profile.- Alternatively, you can run ANALYZE TABLE yourself on any tables you want to profile, then set
call_analyzetofalse. You will still needSELECTprivilege on those tables to fetch the results.
- Alternatively, you can run ANALYZE TABLE yourself on any tables you want to profile, then set
- Check the starter recipe below and replace
workspace_urland eithertoken(for PAT authentication) orazure_authcredentials (for Azure authentication) with your information from the previous steps.