mirror of
https://github.com/datahub-project/datahub.git
synced 2025-07-04 15:50:14 +00:00
137 lines
3.3 KiB
Markdown
137 lines
3.3 KiB
Markdown
import Tabs from '@theme/Tabs';
|
|
import TabItem from '@theme/TabItem';
|
|
|
|
# Operations
|
|
|
|
## Why Would You Use Operations APIs?
|
|
|
|
The Operations APIs allow you to report operational changes that were made to a given Dataset or Table using the 'Operation' concept.
|
|
These operations may be viewed on the Dataset Profile (e.g. as last modified time), accessed via the DataHub GraphQL API, or
|
|
used as inputs to DataHub Cloud [Freshness Assertions](/docs/managed-datahub/observe/freshness-assertions.md).
|
|
|
|
### Goal Of This Guide
|
|
|
|
This guide will show you how to report and query Operations for a Dataset.
|
|
|
|
## Prerequisites
|
|
|
|
For this tutorial, you need to deploy DataHub Quickstart and ingest sample data.
|
|
For detailed steps, please refer to [DataHub Quickstart Guide](/docs/quickstart.md).
|
|
|
|
:::note
|
|
Before reporting operations for a dataset, you need to ensure the targeted dataset is already present in DataHub.
|
|
:::
|
|
|
|
## Report Operations
|
|
|
|
You can use report dataset operations to DataHub using the following APIs.
|
|
|
|
<Tabs>
|
|
<TabItem value="graphql" label="GraphQL" default>
|
|
|
|
```graphql
|
|
mutation reportOperation {
|
|
reportOperation(
|
|
input: {
|
|
urn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)"
|
|
operationType: INSERT
|
|
sourceType: DATA_PROCESS
|
|
}
|
|
)
|
|
}
|
|
```
|
|
|
|
Where supported operation types include
|
|
|
|
- `INSERT`
|
|
- `UPDATE`
|
|
- `DELETE`
|
|
- `CREATE`
|
|
- `ALTER`
|
|
- `DROP`
|
|
- `CUSTOM`
|
|
|
|
If you want to report an operation that happened at a specific time, you can also optionally provide
|
|
the `timestampMillis` field. If not provided, the current server time will be used as the operation time.
|
|
|
|
If you see the following response, the operation was successful:
|
|
|
|
```json
|
|
{
|
|
"data": {
|
|
"reportOperation": true
|
|
},
|
|
"extensions": {}
|
|
}
|
|
```
|
|
|
|
</TabItem>
|
|
|
|
<TabItem value="python" label="Python">
|
|
|
|
```python
|
|
{{ inline /metadata-ingestion/examples/library/dataset_report_operation.py show_path_as_comment }}
|
|
```
|
|
|
|
</TabItem>
|
|
</Tabs>
|
|
|
|
## Read Operations
|
|
|
|
You can use read dataset operations to DataHub using the following APIs.
|
|
|
|
<Tabs>
|
|
<TabItem value="graphql" label="GraphQL" default>
|
|
|
|
```graphql
|
|
query dataset {
|
|
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)") {
|
|
operations(
|
|
limit: 10, filter: [], startTimeMillis: <start-timestamp-ms>, endTimeMillis: <end-timestamp-ms>
|
|
) {
|
|
timestampMillis
|
|
operationType
|
|
sourceType
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Where startTimeMillis and endTimeMillis are optional. By default, operations are sorted by time descending.
|
|
|
|
If you see the following response, the operation was successful:
|
|
|
|
```json
|
|
{
|
|
"data": {
|
|
"dataset": {
|
|
"operations": [
|
|
{
|
|
"timestampMillis": 1231232332,
|
|
"operationType": "INSERT",
|
|
"sourceType": "DATA_PROCESS"
|
|
}
|
|
]
|
|
}
|
|
},
|
|
"extensions": {}
|
|
}
|
|
```
|
|
|
|
</TabItem>
|
|
|
|
<TabItem value="python" label="Python">
|
|
|
|
```python
|
|
{{ inline /metadata-ingestion/examples/library/dataset_read_operations.py show_path_as_comment }}
|
|
```
|
|
|
|
</TabItem>
|
|
</Tabs>
|
|
|
|
### Expected Outcomes of Reporting Operations
|
|
|
|
Reported Operations will appear when displaying the Last Updated time for a Dataset on their DataHub Profile.
|
|
They will also be used when selecting the `DataHub Operation` source type under the **Advanced** settings of a Freshness
|
|
Assertion.
|