2021-11-13 08:13:00 -08:00
# Adding a custom Dataset Data Platform
2025-04-28 23:34:33 +09:00
A Data Platform represents a 3rd party system from which [Metadata Entities ](https://docs.datahub.com/docs/metadata-modeling/metadata-model/ ) are ingested from. Each Dataset that is ingested is associated with a single platform, for example MySQL, Snowflake, Redshift, or BigQuery.
2021-11-13 08:13:00 -08:00
There are some cases in which you may want to add a custom Data Platform identifier for a Dataset. For example,
you have an internal data system that is not widely available, or you're using a Data Platform that is not natively supported by DataHub.
2025-04-16 16:55:51 -07:00
To do so, you can either change the default Data Platforms that are ingested into DataHub _prior to deployment time_ , or ingest
2021-11-13 08:13:00 -08:00
a new Data Platform at runtime. You can use the first option if you're able to periodically m erge new Data Platforms from the OSS
repository into your own. It will cause the custom Data Platform to be re-ingested each time you deploy DataHub, meaning that
2025-04-16 16:55:51 -07:00
your custom Data Platform will persist even between full cleans (nukes) of DataHub.
2021-11-13 08:13:00 -08:00
## Changing Default Data Platforms
2025-04-16 16:55:51 -07:00
Simply make a change to the [data_platforms.yaml ](https://github.com/datahub-project/datahub/blob/master/metadata-service/configuration/src/main/resources/bootstrap_mcps/data-platforms.yaml )
2021-11-13 08:13:00 -08:00
file to add a custom Data Platform:
```
2025-04-16 16:55:51 -07:00
[
2021-11-13 08:13:00 -08:00
.....
{
"urn": "urn:li:dataPlatform:MyCustomDataPlatform",
"aspect": {
"name": "My Custom Data Platform",
"type": "OTHERS",
"logoUrl": "https://< your-logo-url > "
}
}
]
```
## Ingesting Data Platform at runtime
You can also ingest a Data Platform at runtime using either a file-based ingestion source, or using a normal curl to the
2025-04-28 23:34:33 +09:00
[GMS Rest.li APIs ](https://docs.datahub.com/docs/metadata-service#restli-api ).
2021-11-13 08:13:00 -08:00
2023-05-10 15:12:40 +01:00
### Using the cli
```shell
datahub put platform --name MyCustomDataPlatform --display_name "My Custom Data Platform" --logo "https://< your-logo-url > "
```
2021-11-13 08:13:00 -08:00
### Using File-Based Ingestion Recipe
**Step 1** Define a JSON file containing your custom Data Platform
```
2025-04-16 16:55:51 -07:00
// my-custom-data-platform.json
2021-11-13 08:13:00 -08:00
[
{
"auditHeader": null,
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.DataPlatformSnapshot": {
"urn": "urn:li:dataPlatform:MyCustomDataPlatform",
"aspects": [
{
"com.linkedin.pegasus2avro.dataplatform.DataPlatformInfo": {
"datasetNameDelimiter": "/",
"name": "My Custom Data Platform",
"type": "OTHERS",
"logoUrl": "https://< your-logo-url > "
}
}
]
}
},
"proposedDelta": null
}
]
```
2025-04-28 23:34:33 +09:00
**Step 2**: Define an [ingestion recipe ](https://docs.datahub.com/docs/metadata-ingestion/#recipes )
2021-11-13 08:13:00 -08:00
```
---
2025-04-28 23:34:33 +09:00
# see https://docs.datahub.com/docs/generated/ingestion/sources/file for complete documentation
2021-11-13 08:13:00 -08:00
source:
type: "file"
config:
2023-10-04 14:06:03 +05:30
path: "./my-custom-data-platform.json"
2021-11-13 08:13:00 -08:00
2025-04-28 23:34:33 +09:00
# see https://docs.datahub.com/docs/metadata-ingestion/sink_docs/datahub for complete documentation
2021-11-13 08:13:00 -08:00
sink:
2025-04-16 16:55:51 -07:00
...
2021-11-13 08:13:00 -08:00
```
### Using Rest.li API
You can also issue a normal curl request to the Rest.li `/entities` API to add a custom Data Platform.
```
curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
"entity":{
"value":{
"com.linkedin.metadata.snapshot.DataPlatformSnapshot":{
"aspects":[
{
"com.linkedin.dataplatform.DataPlatformInfo":{
"datasetNameDelimiter": "/",
"name": "My Custom Data Platform",
"type": "OTHERS",
"logoUrl": "https://< your-logo-url > "
}
}
],
"urn":"urn:li:dataPlatform:MyCustomDataPlatform"
}
}
}
}'
2025-04-16 16:55:51 -07:00
```