mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-11-04 04:39:10 +00:00 
			
		
		
		
	docs: Adding a custom Data Platform doc (#3561)
This commit is contained in:
		
							parent
							
								
									d7ee955ddb
								
							
						
					
					
						commit
						a510a0c7c1
					
				@ -189,6 +189,7 @@ module.exports = {
 | 
				
			|||||||
      "datahub-web-react/src/app/analytics/README",
 | 
					      "datahub-web-react/src/app/analytics/README",
 | 
				
			||||||
      "metadata-ingestion/developing",
 | 
					      "metadata-ingestion/developing",
 | 
				
			||||||
      "docker/airflow/local_airflow",
 | 
					      "docker/airflow/local_airflow",
 | 
				
			||||||
 | 
					      "docs/how/add-custom-data-platform",
 | 
				
			||||||
    ],
 | 
					    ],
 | 
				
			||||||
    Components: [
 | 
					    Components: [
 | 
				
			||||||
      "datahub-web-react/README",
 | 
					      "datahub-web-react/README",
 | 
				
			||||||
 | 
				
			|||||||
							
								
								
									
										105
									
								
								docs/how/add-custom-data-platform.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										105
									
								
								docs/how/add-custom-data-platform.md
									
									
									
									
									
										Normal file
									
								
							@ -0,0 +1,105 @@
 | 
				
			|||||||
 | 
					# Adding a custom Dataset Data Platform
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					A Data Platform represents a 3rd party system from which [Metadata Entities](https://datahubproject.io/docs/metadata-modeling/metadata-model/) are ingested from. Each Dataset that is ingested is associated with a single platform, for example MySQL, Snowflake, Redshift, or BigQuery.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					There are some cases in which you may want to add a custom Data Platform identifier for a Dataset. For example,
 | 
				
			||||||
 | 
					you have an internal data system that is not widely available, or you're using a Data Platform that is not natively supported by DataHub.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To do so, you can either change the default Data Platforms that are ingested into DataHub *prior to deployment time*, or ingest
 | 
				
			||||||
 | 
					a new Data Platform at runtime. You can use the first option if you're able to periodically merge new Data Platforms from the OSS
 | 
				
			||||||
 | 
					repository into your own. It will cause the custom Data Platform to be re-ingested each time you deploy DataHub, meaning that
 | 
				
			||||||
 | 
					your custom Data Platform will persist even between full cleans (nukes) of DataHub. 
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Changing Default Data Platforms
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Simply make a change to the [data_platforms.json](https://github.com/linkedin/datahub/blob/master/metadata-service/war/src/main/resources/boot/data_platforms.json) 
 | 
				
			||||||
 | 
					file to add a custom Data Platform:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					[ 
 | 
				
			||||||
 | 
					  .....
 | 
				
			||||||
 | 
					  {
 | 
				
			||||||
 | 
					    "urn": "urn:li:dataPlatform:MyCustomDataPlatform",
 | 
				
			||||||
 | 
					    "aspect": {
 | 
				
			||||||
 | 
					      "name": "My Custom Data Platform",
 | 
				
			||||||
 | 
					      "type": "OTHERS",
 | 
				
			||||||
 | 
					      "logoUrl": "https://<your-logo-url>"
 | 
				
			||||||
 | 
					    }
 | 
				
			||||||
 | 
					  }
 | 
				
			||||||
 | 
					]
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Ingesting Data Platform at runtime
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					You can also ingest a Data Platform at runtime using either a file-based ingestion source, or using a normal curl to the
 | 
				
			||||||
 | 
					[GMS Rest.li APIs](https://datahubproject.io/docs/metadata-service#restli-api). 
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Using File-Based Ingestion Recipe
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					**Step 1** Define a JSON file containing your custom Data Platform
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					// my-custom-data-platform.json 
 | 
				
			||||||
 | 
					[
 | 
				
			||||||
 | 
					  {
 | 
				
			||||||
 | 
					    "auditHeader": null,
 | 
				
			||||||
 | 
					    "proposedSnapshot": {
 | 
				
			||||||
 | 
					      "com.linkedin.pegasus2avro.metadata.snapshot.DataPlatformSnapshot": {
 | 
				
			||||||
 | 
					        "urn": "urn:li:dataPlatform:MyCustomDataPlatform",
 | 
				
			||||||
 | 
					        "aspects": [
 | 
				
			||||||
 | 
					          {
 | 
				
			||||||
 | 
					            "com.linkedin.pegasus2avro.dataplatform.DataPlatformInfo": {
 | 
				
			||||||
 | 
					              "datasetNameDelimiter": "/",
 | 
				
			||||||
 | 
					              "name": "My Custom Data Platform",
 | 
				
			||||||
 | 
					              "type": "OTHERS",
 | 
				
			||||||
 | 
					              "logoUrl": "https://<your-logo-url>"
 | 
				
			||||||
 | 
					            }
 | 
				
			||||||
 | 
					          }
 | 
				
			||||||
 | 
					        ]
 | 
				
			||||||
 | 
					      }
 | 
				
			||||||
 | 
					    },
 | 
				
			||||||
 | 
					    "proposedDelta": null
 | 
				
			||||||
 | 
					  }
 | 
				
			||||||
 | 
					]
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					**Step 2**: Define an [ingestion recipe](https://datahubproject.io/docs/metadata-ingestion/#recipes) 
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					# see https://datahubproject.io/docs/metadata-ingestion/source_docs/file for complete documentation
 | 
				
			||||||
 | 
					source:
 | 
				
			||||||
 | 
					  type: "file"
 | 
				
			||||||
 | 
					  config:
 | 
				
			||||||
 | 
					    filename: "./my-custom-data-platform.json"
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# see https://datahubproject.io/docs/metadata-ingestion/sink_docs/datahub for complete documentation
 | 
				
			||||||
 | 
					sink:
 | 
				
			||||||
 | 
					  ... 
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Using Rest.li API
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					You can also issue a normal curl request to the Rest.li `/entities` API to add a custom Data Platform.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
 | 
				
			||||||
 | 
					   "entity":{
 | 
				
			||||||
 | 
					      "value":{
 | 
				
			||||||
 | 
					         "com.linkedin.metadata.snapshot.DataPlatformSnapshot":{
 | 
				
			||||||
 | 
					            "aspects":[
 | 
				
			||||||
 | 
					               {
 | 
				
			||||||
 | 
					                  "com.linkedin.dataplatform.DataPlatformInfo":{
 | 
				
			||||||
 | 
					                      "datasetNameDelimiter": "/",
 | 
				
			||||||
 | 
					                      "name": "My Custom Data Platform",
 | 
				
			||||||
 | 
					                      "type": "OTHERS",
 | 
				
			||||||
 | 
					                      "logoUrl": "https://<your-logo-url>"
 | 
				
			||||||
 | 
					                  }
 | 
				
			||||||
 | 
					               }
 | 
				
			||||||
 | 
					            ],
 | 
				
			||||||
 | 
					            "urn":"urn:li:dataPlatform:MyCustomDataPlatform"
 | 
				
			||||||
 | 
					         }
 | 
				
			||||||
 | 
					      }
 | 
				
			||||||
 | 
					   }
 | 
				
			||||||
 | 
					}'
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user