datahub

mirror of https://github.com/datahub-project/datahub.git synced 2025-07-25 18:38:55 +00:00

History

Andrew Sikowitz 5b290c9bc5

feat(ingest/unity): Add usage extraction; add TableReference (#7910 )

- Adds usage extraction to the unity catalog source and a TableReference object to handle references to tables
Also makes the following refactors:
- Creates UsageAggregator class to usage_common, as I've seen this same logic multiple times.
- Allows customizable user_urn_builder in usage_common as not all unity users are emails. We create emails with a default email_domain config in other connectors like redshift and snowflake, which seems unnecessary now?
- Creates TableReference for unity catalog and adds it to the Table dataclass, for managing string references to tables. Replaces logic, especially in lineage extraction, with these references
- Creates gen_dataset_urn and gen_user_urn on unity source to reduce duplicate code
Breaks up proxy.py into implementation and types

2023-05-01 11:30:09 -07:00

README.md

…

unity-catalog_post.md

…

unity-catalog_pre.md

feat(ingest/unity): Add usage extraction; add TableReference (#7910 )

2023-05-01 11:30:09 -07:00

unity-catalog_recipe.yml

…

README.md

DataHub supports integration with Databricks ecosystem using a multitude of connectors, depending on your exact setup.

Databricks Hive

The simplest way to integrate is usually via the Hive connector. The Hive starter recipe has a section describing how to connect to your Databricks workspace.

Databricks Unity Catalog (new)

The recently introduced Unity Catalog provides a new way to govern your assets within the Databricks lakehouse. If you have enabled Unity Catalog, you can use the unity-catalog source (see below) to integrate your metadata into DataHub as an alternate to the Hive pathway.

Databricks Spark

To complete the picture, we recommend adding push-based ingestion from your Spark jobs to see real-time activity and lineage between your Databricks tables and your Spark jobs. Use the Spark agent to push metadata to DataHub using the instructions here.

Watch the DataHub Talk at the Data and AI Summit 2022

For a deeper look at how to think about DataHub within and across your Databricks ecosystem, watch the recording of our talk at the Data and AI Summit 2022.