mirror of
https://github.com/datahub-project/datahub.git
synced 2025-08-02 06:18:09 +00:00

- Adds usage extraction to the unity catalog source and a TableReference object to handle references to tables Also makes the following refactors: - Creates UsageAggregator class to usage_common, as I've seen this same logic multiple times. - Allows customizable user_urn_builder in usage_common as not all unity users are emails. We create emails with a default email_domain config in other connectors like redshift and snowflake, which seems unnecessary now? - Creates TableReference for unity catalog and adds it to the Table dataclass, for managing string references to tables. Replaces logic, especially in lineage extraction, with these references - Creates gen_dataset_urn and gen_user_urn on unity source to reduce duplicate code Breaks up proxy.py into implementation and types
668 B
668 B
Prerequisities
- Generate a Databrick Personal Access token following the guide here: https://docs.databricks.com/dev-tools/api/latest/authentication.html#generate-a-personal-access-token
- Get your catalog's workspace id by following: https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids
- To enable usage ingestion, ensure the account associated with your access token has
CAN_MANAGE
permissions on any SQL Warehouses you want to ingest: https://docs.databricks.com/security/auth-authz/access-control/sql-endpoint-acl.html - Check the starter recipe below and replace Token and Workspace id with the ones above.