mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-10-31 10:49:00 +00:00 
			
		
		
		
	docs: Adding a custom Data Platform doc (#3561)
This commit is contained in:
		
							parent
							
								
									d7ee955ddb
								
							
						
					
					
						commit
						a510a0c7c1
					
				| @ -189,6 +189,7 @@ module.exports = { | ||||
|       "datahub-web-react/src/app/analytics/README", | ||||
|       "metadata-ingestion/developing", | ||||
|       "docker/airflow/local_airflow", | ||||
|       "docs/how/add-custom-data-platform", | ||||
|     ], | ||||
|     Components: [ | ||||
|       "datahub-web-react/README", | ||||
|  | ||||
							
								
								
									
										105
									
								
								docs/how/add-custom-data-platform.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										105
									
								
								docs/how/add-custom-data-platform.md
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,105 @@ | ||||
| # Adding a custom Dataset Data Platform | ||||
| 
 | ||||
| A Data Platform represents a 3rd party system from which [Metadata Entities](https://datahubproject.io/docs/metadata-modeling/metadata-model/) are ingested from. Each Dataset that is ingested is associated with a single platform, for example MySQL, Snowflake, Redshift, or BigQuery. | ||||
| 
 | ||||
| There are some cases in which you may want to add a custom Data Platform identifier for a Dataset. For example, | ||||
| you have an internal data system that is not widely available, or you're using a Data Platform that is not natively supported by DataHub. | ||||
| 
 | ||||
| To do so, you can either change the default Data Platforms that are ingested into DataHub *prior to deployment time*, or ingest | ||||
| a new Data Platform at runtime. You can use the first option if you're able to periodically merge new Data Platforms from the OSS | ||||
| repository into your own. It will cause the custom Data Platform to be re-ingested each time you deploy DataHub, meaning that | ||||
| your custom Data Platform will persist even between full cleans (nukes) of DataHub.  | ||||
| 
 | ||||
| ## Changing Default Data Platforms | ||||
| 
 | ||||
| Simply make a change to the [data_platforms.json](https://github.com/linkedin/datahub/blob/master/metadata-service/war/src/main/resources/boot/data_platforms.json)  | ||||
| file to add a custom Data Platform: | ||||
| 
 | ||||
| ``` | ||||
| [  | ||||
|   ..... | ||||
|   { | ||||
|     "urn": "urn:li:dataPlatform:MyCustomDataPlatform", | ||||
|     "aspect": { | ||||
|       "name": "My Custom Data Platform", | ||||
|       "type": "OTHERS", | ||||
|       "logoUrl": "https://<your-logo-url>" | ||||
|     } | ||||
|   } | ||||
| ] | ||||
| ``` | ||||
| 
 | ||||
| ## Ingesting Data Platform at runtime | ||||
| 
 | ||||
| You can also ingest a Data Platform at runtime using either a file-based ingestion source, or using a normal curl to the | ||||
| [GMS Rest.li APIs](https://datahubproject.io/docs/metadata-service#restli-api).  | ||||
| 
 | ||||
| ### Using File-Based Ingestion Recipe | ||||
| 
 | ||||
| **Step 1** Define a JSON file containing your custom Data Platform | ||||
| 
 | ||||
| ``` | ||||
| // my-custom-data-platform.json  | ||||
| [ | ||||
|   { | ||||
|     "auditHeader": null, | ||||
|     "proposedSnapshot": { | ||||
|       "com.linkedin.pegasus2avro.metadata.snapshot.DataPlatformSnapshot": { | ||||
|         "urn": "urn:li:dataPlatform:MyCustomDataPlatform", | ||||
|         "aspects": [ | ||||
|           { | ||||
|             "com.linkedin.pegasus2avro.dataplatform.DataPlatformInfo": { | ||||
|               "datasetNameDelimiter": "/", | ||||
|               "name": "My Custom Data Platform", | ||||
|               "type": "OTHERS", | ||||
|               "logoUrl": "https://<your-logo-url>" | ||||
|             } | ||||
|           } | ||||
|         ] | ||||
|       } | ||||
|     }, | ||||
|     "proposedDelta": null | ||||
|   } | ||||
| ] | ||||
| ``` | ||||
| 
 | ||||
| **Step 2**: Define an [ingestion recipe](https://datahubproject.io/docs/metadata-ingestion/#recipes)  | ||||
| 
 | ||||
| ``` | ||||
| --- | ||||
| # see https://datahubproject.io/docs/metadata-ingestion/source_docs/file for complete documentation | ||||
| source: | ||||
|   type: "file" | ||||
|   config: | ||||
|     filename: "./my-custom-data-platform.json" | ||||
| 
 | ||||
| # see https://datahubproject.io/docs/metadata-ingestion/sink_docs/datahub for complete documentation | ||||
| sink: | ||||
|   ...  | ||||
| ``` | ||||
| 
 | ||||
| ### Using Rest.li API | ||||
| 
 | ||||
| You can also issue a normal curl request to the Rest.li `/entities` API to add a custom Data Platform. | ||||
| 
 | ||||
| ``` | ||||
| curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{ | ||||
|    "entity":{ | ||||
|       "value":{ | ||||
|          "com.linkedin.metadata.snapshot.DataPlatformSnapshot":{ | ||||
|             "aspects":[ | ||||
|                { | ||||
|                   "com.linkedin.dataplatform.DataPlatformInfo":{ | ||||
|                       "datasetNameDelimiter": "/", | ||||
|                       "name": "My Custom Data Platform", | ||||
|                       "type": "OTHERS", | ||||
|                       "logoUrl": "https://<your-logo-url>" | ||||
|                   } | ||||
|                } | ||||
|             ], | ||||
|             "urn":"urn:li:dataPlatform:MyCustomDataPlatform" | ||||
|          } | ||||
|       } | ||||
|    } | ||||
| }' | ||||
| ``` | ||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user
	 John Joyce
						John Joyce