mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-11-04 12:51:23 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			137 lines
		
	
	
		
			3.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			137 lines
		
	
	
		
			3.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
import Tabs from '@theme/Tabs';
 | 
						|
import TabItem from '@theme/TabItem';
 | 
						|
 | 
						|
# Operations
 | 
						|
 | 
						|
## Why Would You Use Operations APIs?
 | 
						|
 | 
						|
The Operations APIs allow you to report operational changes that were made to a given Dataset or Table using the 'Operation' concept.
 | 
						|
These operations may be viewed on the Dataset Profile (e.g. as last modified time), accessed via the DataHub GraphQL API, or
 | 
						|
used as inputs to DataHub Cloud [Freshness Assertions](/docs/managed-datahub/observe/freshness-assertions.md).
 | 
						|
 | 
						|
### Goal Of This Guide
 | 
						|
 | 
						|
This guide will show you how to report and query Operations for a Dataset.
 | 
						|
 | 
						|
## Prerequisites
 | 
						|
 | 
						|
For this tutorial, you need to deploy DataHub Quickstart and ingest sample data.
 | 
						|
For detailed steps, please refer to [DataHub Quickstart Guide](/docs/quickstart.md).
 | 
						|
 | 
						|
:::note
 | 
						|
Before reporting operations for a dataset, you need to ensure the targeted dataset is already present in DataHub.
 | 
						|
:::
 | 
						|
 | 
						|
## Report Operations
 | 
						|
 | 
						|
You can use report dataset operations to DataHub using the following APIs.
 | 
						|
 | 
						|
<Tabs>
 | 
						|
<TabItem value="graphql" label="GraphQL" default>
 | 
						|
 | 
						|
```graphql
 | 
						|
mutation reportOperation {
 | 
						|
  reportOperation(
 | 
						|
    input: {
 | 
						|
      urn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)"
 | 
						|
      operationType: INSERT
 | 
						|
      sourceType: DATA_PROCESS
 | 
						|
    }
 | 
						|
  )
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
Where supported operation types include
 | 
						|
 | 
						|
- `INSERT`
 | 
						|
- `UPDATE`
 | 
						|
- `DELETE`
 | 
						|
- `CREATE`
 | 
						|
- `ALTER`
 | 
						|
- `DROP`
 | 
						|
- `CUSTOM`
 | 
						|
 | 
						|
If you want to report an operation that happened at a specific time, you can also optionally provide
 | 
						|
the `timestampMillis` field. If not provided, the current server time will be used as the operation time.
 | 
						|
 | 
						|
If you see the following response, the operation was successful:
 | 
						|
 | 
						|
```json
 | 
						|
{
 | 
						|
  "data": {
 | 
						|
    "reportOperation": true
 | 
						|
  },
 | 
						|
  "extensions": {}
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
</TabItem>
 | 
						|
 | 
						|
<TabItem value="python" label="Python">
 | 
						|
 | 
						|
```python
 | 
						|
{{ inline /metadata-ingestion/examples/library/dataset_report_operation.py show_path_as_comment }}
 | 
						|
```
 | 
						|
 | 
						|
</TabItem>
 | 
						|
</Tabs>
 | 
						|
 | 
						|
## Read Operations
 | 
						|
 | 
						|
You can use read dataset operations to DataHub using the following APIs.
 | 
						|
 | 
						|
<Tabs>
 | 
						|
<TabItem value="graphql" label="GraphQL" default>
 | 
						|
 | 
						|
```graphql
 | 
						|
query dataset {
 | 
						|
    dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)") {
 | 
						|
        operations(
 | 
						|
            limit: 10, filter: [], startTimeMillis: <start-timestamp-ms>, endTimeMillis: <end-timestamp-ms>
 | 
						|
        ) {
 | 
						|
            timestampMillis
 | 
						|
            operationType
 | 
						|
            sourceType
 | 
						|
        }
 | 
						|
    }
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
Where startTimeMillis and endTimeMillis are optional. By default, operations are sorted by time descending.
 | 
						|
 | 
						|
If you see the following response, the operation was successful:
 | 
						|
 | 
						|
```json
 | 
						|
{
 | 
						|
  "data": {
 | 
						|
    "dataset": {
 | 
						|
      "operations": [
 | 
						|
        {
 | 
						|
          "timestampMillis": 1231232332,
 | 
						|
          "operationType": "INSERT",
 | 
						|
          "sourceType": "DATA_PROCESS"
 | 
						|
        }
 | 
						|
      ]
 | 
						|
    }
 | 
						|
  },
 | 
						|
  "extensions": {}
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
</TabItem>
 | 
						|
 | 
						|
<TabItem value="python" label="Python">
 | 
						|
 | 
						|
```python
 | 
						|
{{ inline /metadata-ingestion/examples/library/dataset_read_operations.py show_path_as_comment }}
 | 
						|
```
 | 
						|
 | 
						|
</TabItem>
 | 
						|
</Tabs>
 | 
						|
 | 
						|
### Expected Outcomes of Reporting Operations
 | 
						|
 | 
						|
Reported Operations will appear when displaying the Last Updated time for a Dataset on their DataHub Profile.
 | 
						|
They will also be used when selecting the `DataHub Operation` source type under the **Advanced** settings of a Freshness
 | 
						|
Assertion.
 |