datahub/unity-catalog_pre.md at master

mirror of https://github.com/datahub-project/datahub.git synced 2025-11-14 10:19:51 +00:00

feat(databricks): adds Azure oauth to Databricks (#15117 )

Co-authored-by: pjain155_uhg <anshul_p@optum.com>

2025-11-06 19:46:34 +05:30

4.4 KiB

Raw Permalink Blame History

Prerequisities

Get your Databricks instance's workspace url
Create a Databricks Service Principal
- You can skip this step and use your own account to get things running quickly, but we strongly recommend creating a dedicated service principal for production use.

Authentication Options

You can authenticate with Databricks using either a Personal Access Token or Azure authentication:

Option 1: Personal Access Token (PAT)

Generate a Databricks Personal Access token following the following guides:
- Service Principals
- Personal Access Tokens

Option 2: Azure Authentication (for Azure Databricks)

Create an Azure Active Directory application:
- Follow the Azure AD app registration guide
- Note down the client_id (Application ID), tenant_id (Directory ID), and create a client_secret
Grant the Azure AD application access to your Databricks workspace:
- Add the service principal to your Databricks workspace following this guide

Provision your service account:

To ingest your workspace's metadata and lineage, your service principal must have all of the following:
- One of: metastore admin role, ownership of, or USE CATALOG privilege on any catalogs you want to ingest
- One of: metastore admin role, ownership of, or USE SCHEMA privilege on any schemas you want to ingest
- Ownership of or SELECT privilege on any tables and views you want to ingest
- Ownership documentation
- Privileges documentation
To ingest legacy hive_metastore catalog (include_hive_metastore - enabled by default), your service principal must have all of the following:
- READ_METADATA and USAGE privilege on hive_metastore catalog
- READ_METADATA and USAGE privilege on schemas you want to ingest
- READ_METADATA and USAGE privilege on tables and views you want to ingest
- Hive Metastore Privileges documentation
To ingest your workspace's notebooks and respective lineage, your service principal must have CAN_READ privileges on the folders containing the notebooks you want to ingest: guide.
To include_usage_statistics (enabled by default), your service principal must have one of the following:
- CAN_MANAGE permissions on any SQL Warehouses you want to ingest: guide.
- When usage_data_source is set to SYSTEM_TABLES or AUTO (default) with warehouse_id configured: SELECT privilege on system.query.history table for improved performance with large query volumes and multi-workspace setups.
To ingest profiling information with method: ge, you need SELECT privileges on all profiled tables.
To ingest profiling information with method: analyze and call_analyze: true (enabled by default), your service principal must have ownership or MODIFY privilege on any tables you want to profile.
- Alternatively, you can run ANALYZE TABLE yourself on any tables you want to profile, then set call_analyze to false. You will still need SELECT privilege on those tables to fetch the results.
Check the starter recipe below and replace workspace_url and either token (for PAT authentication) or azure_auth credentials (for Azure authentication) with your information from the previous steps.

4.4 KiB Raw Permalink Blame History

Prerequisities

Authentication Options

Provision your service account:

4.4 KiB

Raw Permalink Blame History