mirror of
https://github.com/datahub-project/datahub.git
synced 2025-11-12 09:23:52 +00:00
58 lines
3.0 KiB
Markdown
58 lines
3.0 KiB
Markdown
import FeatureAvailability from '@site/src/components/FeatureAvailability';
|
|
|
|
# AI Documentation
|
|
|
|
<FeatureAvailability saasOnly />
|
|
|
|
With AI-powered documentation, you can automatically generate documentation for tables and columns.
|
|
|
|
<p align="center">
|
|
<iframe width="560" height="315" src="https://www.youtube.com/embed/_7DieZeZspY?si=Q5FkCA0gZPEFMj0Y" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
|
|
</p>
|
|
|
|
## Prerequisites
|
|
|
|
As of DataHub Cloud v0.3.12, AI documentation is in **Public Beta**. Admins (or users with the "Manage Platform Settings" privilege) can enable it from settings.
|
|
|
|
<p align="center">
|
|
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/ai-docs/ai-docs-toggle.png"/>
|
|
</p>
|
|
|
|
## Usage
|
|
|
|
Ensure you have permissions to edit the dataset description. No other configuration is required - just hit "Generate" on any table or column in the UI.
|
|
|
|
All AI-generated documentation that has not been reviewed by a human will be marked as such with the sparkle icon.
|
|
|
|
<p align="center">
|
|
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/automation/saas/ai-docs/ai-docs-generation.gif"/>
|
|
</p>
|
|
|
|
### Customize Documentation Generation
|
|
|
|
As of v0.3.15, you can customize how documentation is generated by providing custom instructions that are passed to the underlying AI model when generating dcumentation for any Table or Column. This is useful if you want AI-generated documentation to follow specific guidelines or standards set by your organization.
|
|
|
|
To provide custom instructions for documentation generation, start by navigating to **Settings > AI**. Then simply provide custom instructions in the **AI Documentation > Instructions** input.
|
|
|
|
<p align="center">
|
|
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/saas/ai/add_custom_prompts_docs.png"/>
|
|
</p>
|
|
|
|
Note that after updating instructions, it may take up to 5 minutes for the new instructions to take effect.
|
|
|
|
## How it works
|
|
|
|
Generating good documentation requires a holistic understanding of the data. Information we take into account includes, but is not limited to:
|
|
|
|
- Dataset name and any existing documentation
|
|
- Column name, type, description, and sample values
|
|
- Lineage relationships to upstream and downstream assets
|
|
- Metadata about other related assets
|
|
|
|
Data privacy: Your metadata is not sent to any third-party LLMs. We use AWS Bedrock internally, which means all metadata remains within the DataHub Cloud AWS account. We do not fine-tune on customer data.
|
|
|
|
## Limitations
|
|
|
|
- AI documentation is not available for tables with more than 3000 columns (in v0.3.12 the limit was 1000 columns; prior to v0.3.12, it was 100 columns).
|
|
- This feature is powered by LLMs, which can produce inaccurate results. While we've taken steps to reduce the likelihood of hallucinations, they may still occur.
|