2021-07-29 20:04:40 -07:00
|
|
|
# Removing Metadata from DataHub
|
|
|
|
|
|
|
|
There are a two ways to delete data from DataHub.
|
|
|
|
|
2021-08-09 22:30:48 -07:00
|
|
|
|
|
|
|
## Configuring DataHub CLI
|
|
|
|
|
|
|
|
The CLI will point to localhost DataHub by default. Running
|
|
|
|
|
|
|
|
```
|
|
|
|
datahub init
|
|
|
|
```
|
|
|
|
|
|
|
|
will allow you to customize the datahub instance you are communicating with.
|
|
|
|
|
|
|
|
_Note: Provide your GMS instance's host when the prompt asks you for the DataHub host._
|
|
|
|
|
2021-07-29 20:04:40 -07:00
|
|
|
## Delete By Urn
|
|
|
|
|
|
|
|
To delete all the data related to a single entity, run
|
|
|
|
|
2021-08-09 22:30:48 -07:00
|
|
|
```
|
2021-07-29 20:04:40 -07:00
|
|
|
datahub delete --urn "<my urn>"
|
|
|
|
```
|
|
|
|
|
|
|
|
_Note: make sure you surround your urn with quotes! If you do not include the quotes, your terminal may misinterpret the command._
|
|
|
|
|
|
|
|
## Rollback Ingestion Batch Run
|
|
|
|
|
|
|
|
Whenever you run `datahub ingest -c ...`, all the metadata ingested with that run will have the same run id.
|
|
|
|
|
|
|
|
To view the ids of the most recent set of ingestion batches, execute
|
|
|
|
|
2021-08-09 22:30:48 -07:00
|
|
|
```
|
2021-07-29 20:04:40 -07:00
|
|
|
datahub ingest list-runs
|
|
|
|
```
|
|
|
|
|
|
|
|
That will print out a table of all the runs. Once you have an idea of which run you want to roll back, run
|
|
|
|
|
2021-08-09 22:30:48 -07:00
|
|
|
```
|
2021-07-29 20:04:40 -07:00
|
|
|
datahub ingest show --run-id <run-id>
|
|
|
|
```
|
|
|
|
|
|
|
|
to see more info of the run.
|
|
|
|
|
|
|
|
Finally, run
|
|
|
|
|
2021-08-09 22:30:48 -07:00
|
|
|
```
|
2021-07-29 20:04:40 -07:00
|
|
|
datahub ingest rollback --run-id <run-id>
|
|
|
|
```
|
|
|
|
|
|
|
|
To rollback all aspects added with this run and all entities created by this run.
|