mirror of
https://github.com/datahub-project/datahub.git
synced 2025-07-04 23:57:03 +00:00
42 lines
2.0 KiB
Markdown
42 lines
2.0 KiB
Markdown
### Setup
|
||
|
||
The artifacts used by this source are:
|
||
|
||
- [dbt manifest file](https://docs.getdbt.com/reference/artifacts/manifest-json)
|
||
- This file contains model, source, tests and lineage data.
|
||
- [dbt catalog file](https://docs.getdbt.com/reference/artifacts/catalog-json)
|
||
- This file contains schema data.
|
||
- dbt does not record schema data for Ephemeral models, as such datahub will show Ephemeral models in the lineage, however there will be no associated schema for Ephemeral models
|
||
- [dbt sources file](https://docs.getdbt.com/reference/artifacts/sources-json)
|
||
- This file contains metadata for sources with freshness checks.
|
||
- We transfer dbt's freshness checks to DataHub's last-modified fields.
|
||
- Note that this file is optional – if not specified, we'll use time of ingestion instead as a proxy for time last-modified.
|
||
- [dbt run_results file](https://docs.getdbt.com/reference/artifacts/run-results-json)
|
||
- This file contains metadata from the result of a dbt run, e.g. dbt test
|
||
- When provided, we transfer dbt test run results into assertion run events to see a timeline of test runs on the dataset
|
||
|
||
To generate these files, we recommend this workflow for dbt build and datahub ingestion.
|
||
|
||
```sh
|
||
dbt source snapshot-freshness
|
||
dbt build
|
||
cp target/run_results.json target/run_results_backup.json
|
||
dbt docs generate
|
||
cp target/run_results_backup.json target/run_results.json
|
||
|
||
# Run datahub ingestion, pointing at the files in the target/ directory
|
||
```
|
||
|
||
The necessary artifact files will then appear in the `target/` directory of your dbt project.
|
||
|
||
We also have guides on handling more complex dbt orchestration techniques and multi-project setups below.
|
||
|
||
:::note Entity is in manifest but missing from catalog
|
||
|
||
This warning usually appears when the catalog.json file was not generated by a `dbt docs generate` command.
|
||
Most other dbt commands generate a partial catalog file, which may impact the completeness of the metadata in ingested into DataHub.
|
||
|
||
Following the above workflow should ensure that the catalog file is generated correctly.
|
||
|
||
:::
|