mirror of
https://github.com/datahub-project/datahub.git
synced 2025-07-04 15:50:14 +00:00
2.0 KiB
2.0 KiB
Setup
The artifacts used by this source are:
- dbt manifest file
- This file contains model, source, tests and lineage data.
- dbt catalog file
- This file contains schema data.
- dbt does not record schema data for Ephemeral models, as such datahub will show Ephemeral models in the lineage, however there will be no associated schema for Ephemeral models
- dbt sources file
- This file contains metadata for sources with freshness checks.
- We transfer dbt's freshness checks to DataHub's last-modified fields.
- Note that this file is optional – if not specified, we'll use time of ingestion instead as a proxy for time last-modified.
- dbt run_results file
- This file contains metadata from the result of a dbt run, e.g. dbt test
- When provided, we transfer dbt test run results into assertion run events to see a timeline of test runs on the dataset
To generate these files, we recommend this workflow for dbt build and datahub ingestion.
dbt source snapshot-freshness
dbt build
cp target/run_results.json target/run_results_backup.json
dbt docs generate
cp target/run_results_backup.json target/run_results.json
# Run datahub ingestion, pointing at the files in the target/ directory
The necessary artifact files will then appear in the target/
directory of your dbt project.
We also have guides on handling more complex dbt orchestration techniques and multi-project setups below.
:::note Entity is in manifest but missing from catalog
This warning usually appears when the catalog.json file was not generated by a dbt docs generate
command.
Most other dbt commands generate a partial catalog file, which may impact the completeness of the metadata in ingested into DataHub.
Following the above workflow should ensure that the catalog file is generated correctly.
:::