| 
									
										
										
										
											2024-06-06 14:05:44 -07:00
										 |  |  | import Tabs from '@theme/Tabs'; | 
					
						
							|  |  |  | import TabItem from '@theme/TabItem'; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | # Operations
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Why Would You Use Operations APIs?
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The Operations APIs allow you to report operational changes that were made to a given Dataset or Table using the 'Operation' concept. | 
					
						
							| 
									
										
										
										
											2025-04-16 16:55:51 -07:00
										 |  |  | These operations may be viewed on the Dataset Profile (e.g. as last modified time), accessed via the DataHub GraphQL API, or | 
					
						
							|  |  |  | used as inputs to DataHub Cloud [Freshness Assertions](/docs/managed-datahub/observe/freshness-assertions.md). | 
					
						
							| 
									
										
										
										
											2024-06-06 14:05:44 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ### Goal Of This Guide
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2025-04-16 16:55:51 -07:00
										 |  |  | This guide will show you how to report and query Operations for a Dataset. | 
					
						
							| 
									
										
										
										
											2024-06-06 14:05:44 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ## Prerequisites
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For this tutorial, you need to deploy DataHub Quickstart and ingest sample data. | 
					
						
							|  |  |  | For detailed steps, please refer to [DataHub Quickstart Guide](/docs/quickstart.md). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | :::note | 
					
						
							|  |  |  | Before reporting operations for a dataset, you need to ensure the targeted dataset is already present in DataHub. | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Report Operations
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2025-04-16 16:55:51 -07:00
										 |  |  | You can use report dataset operations to DataHub using the following APIs. | 
					
						
							| 
									
										
										
										
											2024-06-06 14:05:44 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | <Tabs> | 
					
						
							|  |  |  | <TabItem value="graphql" label="GraphQL" default> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```graphql | 
					
						
							|  |  |  | mutation reportOperation { | 
					
						
							|  |  |  |   reportOperation( | 
					
						
							| 
									
										
										
										
											2025-04-16 16:55:51 -07:00
										 |  |  |     input: { | 
					
						
							|  |  |  |       urn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)" | 
					
						
							|  |  |  |       operationType: INSERT | 
					
						
							|  |  |  |       sourceType: DATA_PROCESS | 
					
						
							|  |  |  |     } | 
					
						
							| 
									
										
										
										
											2024-06-06 14:05:44 -07:00
										 |  |  |   ) | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Where supported operation types include | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | - `INSERT` | 
					
						
							|  |  |  | - `UPDATE` | 
					
						
							|  |  |  | - `DELETE` | 
					
						
							|  |  |  | - `CREATE` | 
					
						
							|  |  |  | - `ALTER` | 
					
						
							|  |  |  | - `DROP` | 
					
						
							|  |  |  | - `CUSTOM` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | If you want to report an operation that happened at a specific time, you can also optionally provide | 
					
						
							| 
									
										
										
										
											2025-04-16 16:55:51 -07:00
										 |  |  | the `timestampMillis` field. If not provided, the current server time will be used as the operation time. | 
					
						
							| 
									
										
										
										
											2024-06-06 14:05:44 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | If you see the following response, the operation was successful: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```json | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  |   "data": { | 
					
						
							|  |  |  |     "reportOperation": true | 
					
						
							|  |  |  |   }, | 
					
						
							|  |  |  |   "extensions": {} | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | </TabItem> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <TabItem value="python" label="Python"> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```python | 
					
						
							|  |  |  | {{ inline /metadata-ingestion/examples/library/dataset_report_operation.py show_path_as_comment }} | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | </TabItem> | 
					
						
							|  |  |  | </Tabs> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Read Operations
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | You can use read dataset operations to DataHub using the following APIs. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <Tabs> | 
					
						
							|  |  |  | <TabItem value="graphql" label="GraphQL" default> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```graphql | 
					
						
							|  |  |  | query dataset { | 
					
						
							|  |  |  |     dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)") { | 
					
						
							|  |  |  |         operations( | 
					
						
							|  |  |  |             limit: 10, filter: [], startTimeMillis: <start-timestamp-ms>, endTimeMillis: <end-timestamp-ms> | 
					
						
							|  |  |  |         ) { | 
					
						
							|  |  |  |             timestampMillis | 
					
						
							|  |  |  |             operationType | 
					
						
							|  |  |  |             sourceType | 
					
						
							|  |  |  |         } | 
					
						
							|  |  |  |     } | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2025-04-16 16:55:51 -07:00
										 |  |  | Where startTimeMillis and endTimeMillis are optional. By default, operations are sorted by time descending. | 
					
						
							| 
									
										
										
										
											2024-06-06 14:05:44 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | If you see the following response, the operation was successful: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```json | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  |   "data": { | 
					
						
							|  |  |  |     "dataset": { | 
					
						
							| 
									
										
										
										
											2025-04-16 16:55:51 -07:00
										 |  |  |       "operations": [ | 
					
						
							|  |  |  |         { | 
					
						
							|  |  |  |           "timestampMillis": 1231232332, | 
					
						
							|  |  |  |           "operationType": "INSERT", | 
					
						
							|  |  |  |           "sourceType": "DATA_PROCESS" | 
					
						
							|  |  |  |         } | 
					
						
							|  |  |  |       ] | 
					
						
							| 
									
										
										
										
											2024-06-06 14:05:44 -07:00
										 |  |  |     } | 
					
						
							|  |  |  |   }, | 
					
						
							|  |  |  |   "extensions": {} | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | </TabItem> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <TabItem value="python" label="Python"> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```python | 
					
						
							|  |  |  | {{ inline /metadata-ingestion/examples/library/dataset_read_operations.py show_path_as_comment }} | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | </TabItem> | 
					
						
							|  |  |  | </Tabs> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### Expected Outcomes of Reporting Operations
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Reported Operations will appear when displaying the Last Updated time for a Dataset on their DataHub Profile. | 
					
						
							|  |  |  | They will also be used when selecting the `DataHub Operation` source type under the **Advanced** settings of a Freshness | 
					
						
							| 
									
										
										
										
											2025-04-16 16:55:51 -07:00
										 |  |  | Assertion. |