P Anshul Jain bdb46d9909
feat(databricks): adds Azure oauth to Databricks (#15117)
Co-authored-by: pjain155_uhg <anshul_p@optum.com>
2025-11-06 19:46:34 +05:30

4.4 KiB

Prerequisities

  • Get your Databricks instance's workspace url
  • Create a Databricks Service Principal
    • You can skip this step and use your own account to get things running quickly, but we strongly recommend creating a dedicated service principal for production use.

Authentication Options

You can authenticate with Databricks using either a Personal Access Token or Azure authentication:

Option 1: Personal Access Token (PAT)

Option 2: Azure Authentication (for Azure Databricks)

  • Create an Azure Active Directory application:
  • Grant the Azure AD application access to your Databricks workspace:
    • Add the service principal to your Databricks workspace following this guide

Provision your service account:

  • To ingest your workspace's metadata and lineage, your service principal must have all of the following:
    • One of: metastore admin role, ownership of, or USE CATALOG privilege on any catalogs you want to ingest
    • One of: metastore admin role, ownership of, or USE SCHEMA privilege on any schemas you want to ingest
    • Ownership of or SELECT privilege on any tables and views you want to ingest
    • Ownership documentation
    • Privileges documentation
  • To ingest legacy hive_metastore catalog (include_hive_metastore - enabled by default), your service principal must have all of the following:
    • READ_METADATA and USAGE privilege on hive_metastore catalog
    • READ_METADATA and USAGE privilege on schemas you want to ingest
    • READ_METADATA and USAGE privilege on tables and views you want to ingest
    • Hive Metastore Privileges documentation
  • To ingest your workspace's notebooks and respective lineage, your service principal must have CAN_READ privileges on the folders containing the notebooks you want to ingest: guide.
  • To include_usage_statistics (enabled by default), your service principal must have one of the following:
    • CAN_MANAGE permissions on any SQL Warehouses you want to ingest: guide.
    • When usage_data_source is set to SYSTEM_TABLES or AUTO (default) with warehouse_id configured: SELECT privilege on system.query.history table for improved performance with large query volumes and multi-workspace setups.
  • To ingest profiling information with method: ge, you need SELECT privileges on all profiled tables.
  • To ingest profiling information with method: analyze and call_analyze: true (enabled by default), your service principal must have ownership or MODIFY privilege on any tables you want to profile.
    • Alternatively, you can run ANALYZE TABLE yourself on any tables you want to profile, then set call_analyze to false. You will still need SELECT privilege on those tables to fetch the results.
  • Check the starter recipe below and replace workspace_url and either token (for PAT authentication) or azure_auth credentials (for Azure authentication) with your information from the previous steps.