fix(docs): Fixing timeseries delete doc until code path is fixed (#7711)

This commit is contained in:
John Joyce 2023-03-29 14:33:17 -07:00 committed by GitHub
parent 54a372795b
commit cbfe887609
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -1,8 +1,9 @@
# Removing Metadata from DataHub
There are a two ways to delete metadata from DataHub.
- Delete metadata attached to entities by providing a specific urn or a filter that identifies a set of entities
- Delete metadata affected by a single ingestion run
There are a two ways to delete metadata from DataHub:
1. Delete metadata attached to entities by providing a specific urn or filters that identify a set of entities
2. Delete metadata created by a single ingestion run
To follow this guide you need to use [DataHub CLI](../cli.md).
@ -40,25 +41,6 @@ datahub delete --urn "<my urn>" --hard
As of datahub v0.8.35 doing a hard delete by urn will also provide you with a way to remove references to the urn being deleted across the metadata graph. This is important to use if you don't want to have ghost references in your metadata model and want to save space in the graph database.
For now, this behaviour must be opted into by a prompt that will appear for you to manually accept or deny.
Starting v0.8.44.2, this also supports deletion of a specific `timeseries` aspect associated with the entity, optionally for a specific time range.
_Note: Deletion by a specific aspect and time range is currently supported only for timeseries aspects._
```bash
# Delete all of the aspect values for a given entity and a timeseries aspect.
datahub delete --urn "<entity urn>" -a "<timeseries aspect>" --hard
Eg: datahub delete --urn "urn:li:dataset:(urn:li:dataPlatform:snowflake,test_dataset,TEST)" -a "datasetProfile" --hard
# Delete all of the aspect values for a given platform and a timeseries aspect.
datahub delete -p "<platform>" -a "<timeseries aspect>" --hard
Eg: datahub delete -p "snowflake" -a "datasetProfile" --hard
# Delete the aspect values for a given platform and a timeseries aspect corresponding to a specific time range.
datahub delete -p "<platform>" -a "<timeseries aspect>" --start-time '<start_time>' --end-time '<end_time>' --hard
Eg: datahub delete -p "snowflake" -a "datasetProfile" --start-time '2022-05-29 00:00:00' --end-time '2022-05-31 00:00:00' --hard
```
You can optionally add `-n` or `--dry-run` to execute a dry run before issuing the final delete command.
You can optionally add `-f` or `--force` to skip confirmations
You can optionally add `--only-soft-deleted` flag to remove soft-deleted items only.
@ -75,14 +57,14 @@ If you wish to hard-delete using a curl request you can use something like below
curl "http://localhost:8080/entities?action=delete" -X POST --data '{"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)"}'
```
## Delete using Broader Filters
## Delete by filters
_Note: All these commands below support the soft-delete option (`-s/--soft`) as well as the dry-run option (`-n/--dry-run`).
### Delete all datasets in the DEV environment
### Delete all Datasets from the Snowflake platform
```
datahub delete --env DEV --entity_type dataset
datahub delete --entity_type dataset --platform snowflake
```
### Delete all containers for a particular platform
@ -90,10 +72,15 @@ datahub delete --env DEV --entity_type dataset
datahub delete --entity_type container --platform s3
```
### Delete all datasets in the DEV environment
```
datahub delete --env DEV --entity_type dataset
```
### Delete all Pipelines and Tasks in the DEV environment
```
datahub delete --env DEV --entity_type "datajob"
datahub delete --env DEV --entity_type "dataflow"
datahub delete --env DEV --entity_type "dataJob"
datahub delete --env DEV --entity_type "dataFlow"
```
### Delete all bigquery datasets in the PROD environment
@ -109,10 +96,10 @@ datahub delete --entity_type chart --platform looker
### Delete all datasets that match a query
```
datahub delete --entity_type dataset --query "_tmp" -n
datahub delete --entity_type dataset --query "_tmp"
```
## Rollback Ingestion Batch Run
## Rollback Ingestion Run
The second way to delete metadata is to identify entities (and the aspects affected) by using an ingestion `run-id`. Whenever you run `datahub ingest -c ...`, all the metadata ingested with that run will have the same run id.