2021-07-29 20:04:40 -07:00
# Removing Metadata from DataHub
2021-11-07 22:13:50 -08:00
There are a two ways to delete metadata from DataHub.
- Delete metadata attached to entities by providing a specific urn or a filter that identifies a set of entities
- Delete metadata affected by a single ingestion run
2021-07-29 20:04:40 -07:00
2021-12-07 23:39:59 +05:30
To follow this guide you need to use [DataHub CLI ](../cli.md ).
2021-11-07 22:13:50 -08:00
Read on to find out how to perform these kinds of deletes.
_Note: Deleting metadata should only be done with care. Always use `--dry-run` to understand what will be deleted before proceeding. Prefer soft-deletes (`--soft` ) unless you really want to nuke metadata rows. Hard deletes will actually delete rows in the primary store and recovering them will require using backups of the primary metadata store. Make sure you understand the implications of issuing soft-deletes versus hard-deletes before proceeding._
2021-08-09 22:30:48 -07:00
2021-11-24 00:21:44 -08:00
## Delete By Urn
2021-08-09 22:30:48 -07:00
2021-11-24 00:21:44 -08:00
To delete all the data related to a single entity, run
2021-08-09 22:30:48 -07:00
2021-11-24 00:21:44 -08:00
### Soft Delete (the default)
2021-08-09 22:30:48 -07:00
2021-11-24 00:21:44 -08:00
This sets the `Status` aspect of the entity to `Removed` , which hides the entity and all its aspects from being returned by the UI.
2021-09-17 11:48:42 +05:30
```
2021-11-24 00:21:44 -08:00
datahub delete --urn "< my urn > "
2021-09-17 11:48:42 +05:30
```
2021-11-24 00:21:44 -08:00
or
2021-11-07 22:13:50 -08:00
```
datahub delete --urn "< my urn > " --soft
```
### Hard Delete
2021-11-24 00:21:44 -08:00
This physically deletes all rows for all aspects of the entity. This action cannot be undone, so execute this only after you are sure you want to delete all data associated with this entity.
2021-08-09 22:30:48 -07:00
```
2021-11-24 00:21:44 -08:00
datahub delete --urn "< my urn > " --hard
2021-07-29 20:04:40 -07:00
```
2021-11-07 22:13:50 -08:00
You can optionally add `-n` or `--dry-run` to execute a dry run before issuing the final delete command.
2021-09-29 11:19:34 +05:30
You can optionally add `-f` or `--force` to skip confirmations
2021-07-29 20:04:40 -07:00
_Note: make sure you surround your urn with quotes! If you do not include the quotes, your terminal may misinterpret the command._
2022-01-26 08:24:19 +05:30
If you wish to hard-delete using a curl request you can use something like below. Replace the URN with the URN that you wish to delete
```
curl "http://localhost:8080/entities?action=delete" -X POST --data '{"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)"}'
```
2021-11-07 22:13:50 -08:00
## Delete using Broader Filters
_Note: All these commands below support the soft-delete option (`-s/--soft` ) as well as the dry-run option (`-n/--dry-run` )._
### Delete all datasets in the DEV environment
```
datahub delete --env DEV --entity_type dataset
```
### Delete all bigquery datasets in the PROD environment
```
datahub delete --env PROD --entity_type dataset --platform bigquery
```
### Delete all looker dashboards and charts
```
datahub delete --entity_type dashboard --platform looker
datahub delete --entity_type chart --platform looker
```
### Delete all datasets that match a query
```
datahub delete --entity_type dataset --query "_tmp" -n
```
2021-07-29 20:04:40 -07:00
## Rollback Ingestion Batch Run
2021-11-07 22:13:50 -08:00
The second way to delete metadata is to identify entities (and the aspects affected) by using an ingestion `run-id` . Whenever you run `datahub ingest -c ...` , all the metadata ingested with that run will have the same run id.
2021-07-29 20:04:40 -07:00
To view the ids of the most recent set of ingestion batches, execute
2021-08-09 22:30:48 -07:00
```
2021-07-29 20:04:40 -07:00
datahub ingest list-runs
```
That will print out a table of all the runs. Once you have an idea of which run you want to roll back, run
2021-08-09 22:30:48 -07:00
```
2021-07-29 20:04:40 -07:00
datahub ingest show --run-id < run-id >
```
to see more info of the run.
2021-11-24 00:21:44 -08:00
Alternately, you can execute a dry-run rollback to achieve the same outcome.
```
datahub ingest rollback --dry-run --run-id < run-id >
```
Finally, once you are sure you want to delete this data forever, run
2021-07-29 20:04:40 -07:00
2021-08-09 22:30:48 -07:00
```
2021-07-29 20:04:40 -07:00
datahub ingest rollback --run-id < run-id >
```
2021-11-24 00:21:44 -08:00
to rollback all aspects added with this run and all entities created by this run.