DataHub Generalized Metadata Store (GMS)

DataHub GMS is a Rest.li service written in Java. It is following common Rest.li server development practices and all data models are Pegasus(.pdl) models.

Pre-requisites

  • You need to have JDK8 installed on your machine to be able to build DataHub GMS.

Build

DataHub GMS is already built as part of top level build:

./gradlew build

However, if you only want to build DataHub GMS specifically:

./gradlew :gms:war:build

Dependencies

Before starting DataHub GMS, you need to make sure that Kafka, Schema Registry & Zookeeper,
Elasticsearch and MySQL Docker containers are up and running.

Start via Docker image

Quickest way to try out DataHub GMS is running the Docker image.

Start via command line

If you do modify things and want to try it out quickly without building the Docker image, you can also run the application directly from command line after a successful build:

./gradlew :gms:war:run

To run with debug logs printed to console, use

./gradlew :gms:war:run -Dlogback.debug=true

API Documentation

You can access basic documentation on the API endpoints by opening the /restli/docs endpoint in the browser.

python -c "import webbrowser; webbrowser.open('http://localhost:8080/restli/docs', new=2)"

*Please note that because DataHub is in a period of rapid development, the APIs below are subject to change.

Sample API Calls

Ingesting Entities

The Entity Snapshot Ingest endpoints allow you to ingest multiple aspects about a particular entity at the same time.

Create a user

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
   "entity":{
      "value":{
         "com.linkedin.metadata.snapshot.CorpUserSnapshot":{
            "urn":"urn:li:corpuser:footbarusername",
            "aspects":[
               {
                  "com.linkedin.identity.CorpUserInfo":{
                     "active":true,
                     "displayName":"Foo Bar",
                     "fullName":"Foo Bar",
                     "email":"fbar@linkedin.com"
                  }
               }
            ]
         }
      }
   }
}'

Create a group

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
   "entity":{
      "value":{
         "com.linkedin.metadata.snapshot.CorpGroupSnapshot":{
            "urn":"urn:li:corpGroup:dev",
            "aspects":[
               {
                  "com.linkedin.identity.CorpGroupInfo":{
                     "email":"dev@linkedin.com",
                     "admins":[
                        "urn:li:corpUser:jdoe"
                     ],
                     "members":[
                        "urn:li:corpUser:datahub",
                        "urn:li:corpUser:jdoe"
                     ],
                     "groups":[
                        
                     ]
                  }
               }
            ]
         }
      }
   }
}'

Create a dataset

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
   "entity":{
      "value":{
         "com.linkedin.metadata.snapshot.DatasetSnapshot":{
            "urn":"urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)",
            "aspects":[
               {
                  "com.linkedin.common.Ownership":{
                     "owners":[
                        {
                           "owner":"urn:li:corpuser:fbar",
                           "type":"DATAOWNER"
                        }
                     ],
                     "lastModified":{
                        "time":0,
                        "actor":"urn:li:corpuser:fbar"
                     }
                  }
               },
               {
                  "com.linkedin.common.InstitutionalMemory":{
                     "elements":[
                        {
                           "url":"https://www.linkedin.com",
                           "description":"Sample doc",
                           "createStamp":{
                              "time":0,
                              "actor":"urn:li:corpuser:fbar"
                           }
                        }
                     ]
                  }
               },
               {
                  "com.linkedin.schema.SchemaMetadata":{
                     "schemaName":"FooEvent",
                     "platform":"urn:li:dataPlatform:foo",
                     "version":0,
                     "created":{
                        "time":0,
                        "actor":"urn:li:corpuser:fbar"
                     },
                     "lastModified":{
                        "time":0,
                        "actor":"urn:li:corpuser:fbar"
                     },
                     "hash":"",
                     "platformSchema":{
                        "com.linkedin.schema.KafkaSchema":{
                           "documentSchema":"{\"type\":\"record\",\"name\":\"MetadataChangeEvent\",\"namespace\":\"com.linkedin.mxe\",\"doc\":\"Kafka event for proposing a metadata change for an entity.\",\"fields\":[{\"name\":\"auditHeader\",\"type\":{\"type\":\"record\",\"name\":\"KafkaAuditHeader\",\"namespace\":\"com.linkedin.avro2pegasus.events\",\"doc\":\"Header\"}}]}"
                        }
                     },
                     "fields":[
                        {
                           "fieldPath":"foo",
                           "description":"Bar",
                           "nativeDataType":"string",
                           "type":{
                              "type":{
                                 "com.linkedin.schema.StringType":{
                                    
                                 }
                              }
                           }
                        }
                     ]
                  }
               }
            ]
         }
      }
   }
}'

Create a chart

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
   "entity":{
      "value":{
         "com.linkedin.metadata.snapshot.ChartSnapshot":{
            "urn":"urn:li:chart:(looker,baz1)",
            "aspects":[
               {
                  "com.linkedin.chart.ChartInfo":{
                     "title":"Baz Chart 1",
                     "description":"Baz Chart 1",
                     "inputs":[
                        {
                           "string":"urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
                        }
                     ],
                     "lastModified":{
                        "created":{
                           "time":0,
                           "actor":"urn:li:corpuser:jdoe"
                        },
                        "lastModified":{
                           "time":0,
                           "actor":"urn:li:corpuser:datahub"
                        }
                     }
                  }
               }
            ]
         }
      }
   }
}'

Create a dashboard

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
   "entity":{
      "value":{
         "com.linkedin.metadata.snapshot.DashboardSnapshot":{
            "urn":"urn:li:dashboard:(looker,baz)",
            "aspects":[
               {
                  "com.linkedin.dashboard.DashboardInfo":{
                     "title":"Baz Dashboard",
                     "description":"Baz Dashboard",
                     "charts":[
                        "urn:li:chart:(looker,baz1)",
                        "urn:li:chart:(looker,baz2)"
                     ],
                     "lastModified":{
                        "created":{
                           "time":0,
                           "actor":"urn:li:corpuser:jdoe"
                        },
                        "lastModified":{
                           "time":0,
                           "actor":"urn:li:corpuser:datahub"
                        }
                     }
                  }
               }
            ]
         }
      }
   }
}'

Create Tags

To create a new tag called "Engineering", we can use the following curl.

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
   "entity":{
      "value":{
         "com.linkedin.metadata.snapshot.TagSnapshot":{
            "urn":"urn:li:tag:Engineering",
            "aspects":[
               {
                  "com.linkedin.dashboard.TagProperties":{
                     "name":"Engineering",
                     "description":"The tag will be assigned to all assets owned by the Eng org."
                  }
               }
            ]
         }
      }
   }
}'

This tag can subsequently be associated with a Data Asset using the "Global Tags" aspect associated with each. For example, to add a tag to a Dataset, you can use the following CURL:

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
   "entity":{
      "value":{
         "com.linkedin.metadata.snapshot.DatasetSnapshot":{
            "urn":"urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)",
            "aspects":[
               {
                  "com.linkedin.common.GlobalTags":{
                     "tags":[
                        {
                           "tag":"urn:li:tag:Engineering"
                        }
                     ]
                  }
               }
            ]
         }
      }
   }
}'

And to add the tag to a field in a particular Dataset's schema, you can use a CURL to update the EditableSchemaMetadata Aspect:

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
   "entity":{
      "value":{
         "com.linkedin.metadata.snapshot.DatasetSnapshot":{
            "urn":"urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)",
            "aspects":[
               {
                  "com.linkedin.schema.EditableSchemaMetadata": { 
                     "editableSchemaFieldInfo":[
                        {
                           "fieldPath":"myFieldName",
                           "globalTags": {
                              "tags":[
                                 {
                                     "tag":"urn:li:tag:Engineering"
                                 }
                              ]
                           }
                        }
                     ]
                  }
               }
            ]
         }
      }
   }
}'

Soft Deleting an Entity

DataHub uses a special "Status" aspect associated with each entity to represent the lifecycle state of an Entity. To soft delete an entire Entity, such that it no longer appears in the UI, you can use the special "Status" aspect.

For example, to delete a particular chart:

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
   "entity":{
      "value":{
         "com.linkedin.metadata.snapshot.ChartSnapshot":{
            "aspects":[
               {
                  "com.linkedin.common.Status":{
                     "removed": true
                  }
               }
            ],
            "urn":"urn:li:chart:(looker,baz1)"
         }
      }
   }
}'

To re-enable the Entity, you can make a similar request:

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
   "entity":{
      "value":{
         "com.linkedin.metadata.snapshot.ChartSnapshot":{
            "aspects":[
               {
                  "com.linkedin.common.Status":{
                     "removed": false
                  }
               }
            ],
            "urn":"urn:li:chart:(looker,baz1)"
         }
      }
   }
}'

To issue a hard delete, or undo a particular ingestion run, you can use the DataHub CLI.

*Note that soft deletes are coming soon to the DataHub CLI.

Retrieving Entities

The Entity Snapshot Get APIs allow to retrieve the latest version of each aspect associated with an Entity.

In general, when reading entities by primary key (urn), you will use the general-purpose entities endpoints. To fetch by primary key (urn), you'll issue a query of the following form:

curl  'http://localhost:8080/entities/<url-encoded-entity-urn>'

Get a CorpUser

curl 'http://localhost:8080/entities/urn%3Ali%3Acorpuser%3Afbar'

{
   "value":{
      "com.linkedin.metadata.snapshot.CorpUserSnapshot":{
         "urn":"urn:li:corpuser:fbar",
         "aspects":[
            {
               "com.linkedin.metadata.key.CorpUserKey":{
                  "username":"fbar"
               }
            },
            {
               "com.linkedin.identity.CorpUserInfo":{
                  "active":true,
                  "fullName":"Foo Bar",
                  "displayName":"Foo Bar",
                  "email":"fbar@linkedin.com"
               }
            },
            {
               "com.linkedin.identity.CorpUserEditableInfo":{
                  
               }
            }
         ]
      }
   }
}

Get a CorpGroup

curl 'http://localhost:8080/entities/urn%3Ali%3AcorpGroup%3Adev'

{
   "value":{
      "com.linkedin.metadata.snapshot.CorpGroupSnapshot":{
         "urn":"urn:li:corpGroup:dev",
         "aspects":[
            {
               "com.linkedin.metadata.key.CorpGroupKey":{
                  "name":"dev"
               }
            },
            {
               "com.linkedin.identity.CorpGroupInfo":{
                  "groups":[
                     
                  ],
                  "email":"dev@linkedin.com",
                  "admins":[
                     "urn:li:corpUser:jdoe"
                  ],
                  "members":[
                     "urn:li:corpUser:datahub",
                     "urn:li:corpUser:jdoe"
                  ]
               }
            }
         ]
      }
   }
}

Get a Dataset

curl 'http://localhost:8080/entities/urn%3Ali%3Adataset%3A(urn%3Ali%3AdataPlatform%3Afoo,bar,PROD)'

{
   "value":{
      "com.linkedin.metadata.snapshot.DatasetSnapshot":{
         "urn":"urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)",
         "aspects":[
            {
               "com.linkedin.metadata.key.DatasetKey":{
                  "origin":"PROD",
                  "name":"bar",
                  "platform":"urn:li:dataPlatform:foo"
               }
            },
            {
               "com.linkedin.common.InstitutionalMemory":{
                  "elements":[
                     {
                        "createStamp":{
                           "actor":"urn:li:corpuser:fbar",
                           "time":0
                        },
                        "description":"Sample doc",
                        "url":"https://www.linkedin.com"
                     }
                  ]
               }
            },
            {
               "com.linkedin.common.Ownership":{
                  "owners":[
                     {
                        "owner":"urn:li:corpuser:fbar",
                        "type":"DATAOWNER"
                     }
                  ],
                  "lastModified":{
                     "actor":"urn:li:corpuser:fbar",
                     "time":0
                  }
               }
            },
            {
               "com.linkedin.schema.SchemaMetadata":{
                  "created":{
                     "actor":"urn:li:corpuser:fbar",
                     "time":0
                  },
                  "platformSchema":{
                     "com.linkedin.schema.KafkaSchema":{
                        "documentSchema":"{\"type\":\"record\",\"name\":\"MetadataChangeEvent\",\"namespace\":\"com.linkedin.mxe\",\"doc\":\"Kafka event for proposing a metadata change for an entity.\",\"fields\":[{\"name\":\"auditHeader\",\"type\":{\"type\":\"record\",\"name\":\"KafkaAuditHeader\",\"namespace\":\"com.linkedin.avro2pegasus.events\",\"doc\":\"Header\"}}]}"
                     }
                  },
                  "lastModified":{
                     "actor":"urn:li:corpuser:fbar",
                     "time":0
                  },
                  "schemaName":"FooEvent",
                  "fields":[
                     {
                        "fieldPath":"foo",
                        "description":"Bar",
                        "type":{
                           "type":{
                              "com.linkedin.schema.StringType":{
                                 
                              }
                           }
                        },
                        "nativeDataType":"string"
                     }
                  ],
                  "version":0,
                  "hash":"",
                  "platform":"urn:li:dataPlatform:foo"
               }
            },
            {
               "com.linkedin.common.BrowsePaths":{
                  "paths":[
                     "/prod/foo/bar"
                  ]
               }
            },
            {
               "com.linkedin.dataset.UpstreamLineage":{
                  "upstreams":[
                     {
                        "auditStamp":{
                           "actor":"urn:li:corpuser:fbar",
                           "time":0
                        },
                        "type":"TRANSFORMED",
                        "dataset":"urn:li:dataset:(urn:li:dataPlatform:foo,barUp,PROD)"
                     }
                  ]
               }
            }
         ]
      }
   }
}

Get a Chart

curl 'http://localhost:8080/entities/urn%3Ali%3Achart%3A(looker,baz1)'

{
   "value":{
      "com.linkedin.metadata.snapshot.ChartSnapshot":{
         "urn":"urn:li:chart:(looker,baz1)",
         "aspects":[
            {
               "com.linkedin.metadata.key.ChartKey":{
                  "chartId":"baz1",
                  "dashboardTool":"looker"
               }
            },
            {
               "com.linkedin.common.BrowsePaths":{
                  "paths":[
                     "/looker/baz1"
                  ]
               }
            },
            {
               "com.linkedin.chart.ChartInfo":{
                  "description":"Baz Chart 1",
                  "lastModified":{
                     "created":{
                        "actor":"urn:li:corpuser:jdoe",
                        "time":0
                     },
                     "lastModified":{
                        "actor":"urn:li:corpuser:datahub",
                        "time":0
                     }
                  },
                  "title":"Baz Chart 1",
                  "inputs":[
                     {
                        "string":"urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
                     }
                  ]
               }
            }
         ]
      }
   }
}

Get a Dashboard

curl 'http://localhost:8080/entities/urn%3Ali%3Adashboard%3A(looker,foo)'

{
   "value":{
      "com.linkedin.metadata.snapshot.DashboardSnapshot":{
         "urn":"urn:li:dashboard:(looker,foo)",
         "aspects":[
            {
               "com.linkedin.metadata.key.DashboardKey":{
                  "dashboardId":"foo",
                  "dashboardTool":"looker"
               }
            }
         ]
      }
   }
}

Get a GlossaryTerm

curl 'http://localhost:8080/entities/urn%3Ali%3AglossaryTerm%3A(instruments,instruments.FinancialInstrument_v1)'
{
   "value":{
      "com.linkedin.metadata.snapshot.GlossaryTermSnapshot":{
         "urn":"urn:li:glossaryTerm:instruments.FinancialInstrument_v1",
         "ownership":{
            "owners":[
               {
                  "owner":"urn:li:corpuser:jdoe",
                  "type":"DATAOWNER"
               }
            ],
            "lastModified":{
               "actor":"urn:li:corpuser:jdoe",
               "time":1581407189000
            }
         },
         "glossaryTermInfo":{
            "definition":"written contract that gives rise to both a financial asset of one entity and a financial liability of another entity",
            "customProperties":{
               "FQDN":"full"
            },
            "sourceRef":"FIBO",
            "sourceUrl":"https://spec.edmcouncil.org/fibo/ontology/FBC/FinancialInstruments/FinancialInstruments/FinancialInstrument",
            "termSource":"EXTERNAL"
         }
      }
   }
}

Browse an Entity

To browse (explore) for an Entity of a particular type (e.g. dataset, chart, etc), you can use the following query format:

curl -X POST 'http://localhost:8080/entities?action=browse' \
--data '{
    "path": "<slash-delimited-browse-path>",
    "entity": "<entity name>",
    "start": 0,
    "limit": 10
}'

For example, to browse the "charts" entity, you could use the following query:

curl -X POST 'http://localhost:8080/entities?action=browse' \
--data '{
    "path": "/looker",
    "entity": "chart",
    "start": 0,
    "limit": 10
}'

{
   "value":{
      "numEntities":1,
      "pageSize":1,
      "metadata":{
         "totalNumEntities":1,
         "groups":[
            
         ],
         "path":"/looker"
      },
      "from":0,
      "entities":[
         {
            "name":"baz1",
            "urn":"urn:li:chart:(looker,baz1)"
         }
      ]
   }
}

Search an Entity

To search for an Entity of a particular type (e.g. dataset, chart, etc), you can use the following query format:

curl -X POST 'http://localhost:8080/entities?action=search' \
--data '{
    "input": "<query-text>",
    "entity": "<entity name>",
    "start": 0,
    "count": 10
}'

The API will return a list of URNs that matched your search query.

For example, to search the "charts" entity, you could use the following query:

curl -X POST 'http://localhost:8080/entities?action=search' \
--data '{
    "input": "looker",
    "entity": "chart",
    "start": 0,
    "count": 10
}'

{
   "value":{
      "numEntities":1,
      "pageSize":10,
      "metadata":{
         "urns":[
            "urn:li:chart:(looker,baz1)"
         ],
         "matches":[
            {
               "matchedFields":[
                  {
                     "name":"tool",
                     "value":"looker"
                  }
               ]
            }
         ],
         "searchResultMetadatas":[
            {
               "name":"tool",
               "aggregations":{
                  "looker":1
               }
            }
         ]
      },
      "from":0,
      "entities":[
         "urn:li:chart:(looker,baz1)"
      ]
   }
}

You can use colon search for exact match searching on particular @Searchable fields of an Entity.

Example: Find assets by Tag

For example, to fetch all Datasets having a particular tag (Engineering), we can use the following query:

curl -X POST 'http://localhost:8080/entities?action=search' \
--data '{
    "input": "tags:Engineering",
    "entity": "dataset",
    "start": 0,
    "count": 10
}'

{
   "value":{
      "numEntities":1,
      "pageSize":10,
      "metadata":{
         "urns":[
            "urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)"
         ],
         "matches":[
            {
               "matchedFields":[
                  {
                     "name":"tags",
                     "value":"urn:li:tag:Engineering"
                  }
               ]
            }
         ],
         "searchResultMetadatas":[
            {
               "name":"platform",
               "aggregations":{
                  "foo":1
               }
            },
            {
               "name":"origin",
               "aggregations":{
                  "PROD":1
               }
            }
         ]
      },
      "from":0,
      "entities":[
         "urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)"
      ]
   }
}
Filtering

In addition to performing full-text search, you can also filter explicitly against fields marked as @Searchable in the corresponding aspect PDLs.

For example, to perform filtering for a chart with title "Baz Chart 1", you could issue the following query:

curl -X POST 'http://localhost:8080/entities?action=search' \
--data '{
    "input": "looker",
    "entity": "chart",
    "start": 0,
    "count": 10,
    "filter": {
        "criteria": [
           {
                "field": "title",
                "value": "Baz Chart 1",
                "condition": "EQUAL"
           }
        ]
    }
}'

{
   "value":{
      "numEntities":1,
      "pageSize":10,
      "metadata":{
         "urns":[
            "urn:li:chart:(looker,baz1)"
         ],
         "matches":[
            {
               "matchedFields":[
                  {
                     "name":"tool",
                     "value":"looker"
                  }
               ]
            }
         ],
         "searchResultMetadatas":[
            {
               "name":"tool",
               "aggregations":{
                  "looker":1
               }
            }
         ]
      },
      "from":0,
      "entities":[
         "urn:li:chart:(looker,baz1)"
      ]
   }
}

where valid conditions include - CONTAIN - END_WITH - EQUAL - GREATER_THAN - GREATER_THAN_OR_EQUAL_TO - LESS_THAN - LESS_THAN_OR_EQUAL_TO - START_WITH

*Note that the search API only includes data corresponding to the latest snapshots of a particular Entity.

Autocomplete against fields of an entity

To autocomplete a query for a particular entity type, you can use a query of the following form:

curl -X POST 'http://localhost:8080/entities?action=autocomplete' \
--data '{
    "query": "<query-text>",
    "entity": "<entity-name>",
    "limit": 10
}'

For example, to autocomplete a query against all Dataset entities, you could issue the following:

curl -X POST 'http://localhost:8080/entities?action=autocomplete' \
--data '{
    "query": "Baz Ch",
    "entity": "chart",
    "start": 0,
    "limit": 10
}'

{
   "value":{
      "suggestions":[
         "Baz Chart 1"
      ],
      "query":"Baz Ch"
   }
}

Note that you can also provide a Filter to the autocomplete endpoint:

curl -X POST 'http://localhost:8080/entities?action=autocomplete' \
--data '{
    "query": "Baz C",
    "entity": "chart",
    "start": 0,
    "limit": 10,
    "filter": {
        "criteria": [
           {
                "field": "tool",
                "value": "looker",
                "condition": "EQUAL"
           }
        ]
    }
}'

{
   "value":{
      "suggestions":[
         "Baz Chart 1"
      ],
      "query":"Baz Ch"
   }
}

*Note that the autocomplete API only includes data corresponding to the latest snapshots of a particular Entity.

Get a Versioned Aspect

In addition to fetching the set of latest Snapshot aspects for an entity, we also support doing a point lookup of an entity at a particular version.

To do so, you can use the following query template:

curl 'http://localhost:8080/aspects/<url-encoded-entity-urn>?aspect=<aspect-name>&version=<version>

Which will return a VersionedAspect, which is a record containing a version and an aspect inside a Rest.li Union, wherein the fully-qualified record name of the aspect is the key for the union.

For example, to fetch the latest version of a Dataset's "schemaMetadata" aspect, you could issue the following query:

curl 'http://localhost:8080/aspects/urn%3Ali%3Adataset%3A(urn%3Ali%3AdataPlatform%3Afoo%2Cbar%2CPROD)?aspect=schemaMetadata&version=0'

{
   "version":0,
   "aspect":{
      "com.linkedin.schema.SchemaMetadata":{
         "created":{
            "actor":"urn:li:corpuser:fbar",
            "time":0
         },
         "platformSchema":{
            "com.linkedin.schema.KafkaSchema":{
               "documentSchema":"{\"type\":\"record\",\"name\":\"MetadataChangeEvent\",\"namespace\":\"com.linkedin.mxe\",\"doc\":\"Kafka event for proposing a metadata change for an entity.\",\"fields\":[{\"name\":\"auditHeader\",\"type\":{\"type\":\"record\",\"name\":\"KafkaAuditHeader\",\"namespace\":\"com.linkedin.avro2pegasus.events\",\"doc\":\"Header\"}}]}"
            }
         },
         "lastModified":{
            "actor":"urn:li:corpuser:fbar",
            "time":0
         },
         "schemaName":"FooEvent",
         "fields":[
            {
               "fieldPath":"foo",
               "description":"Bar",
               "type":{
                  "type":{
                     "com.linkedin.schema.StringType":{
                        
                     }
                  }
               },
               "nativeDataType":"string"
            }
         ],
         "version":0,
         "hash":"",
         "platform":"urn:li:dataPlatform:foo"
      }
   }
}

Keep in mind that versions increase monotonically after version 0, which represents the latest.

Note that this API will soon be deprecated and replaced by the V2 Aspect API, discussed below.

Get a range of Versioned Aspects

Coming Soon!

Get a range of Timeseries Aspects

With the introduction of Timeseries Aspects, we've introduced a new API for fetching a series of aspects falling into a particular time range. For this, you'll use the /aspects endpoint. The V2 APIs are unique in that they return a new type of payload: an "Enveloped Aspect". This is essentially a serialized aspect along with some system metadata. The serialized aspect can be in any form, though we currently default to escaped Rest.li-compatible JSON.

Callers of the V2 Aspect APIs will be expected to deserialize the aspect payload in the way they see fit. For example, they may bind the deserialized JSON object into a strongly typed Rest.li RecordTemplate class (which is what datahub-frontend does). The benefit of doing it this way is thaet we remove the necessity to use Rest.li Unions to represent an object which can take on multiple payload forms. It also makes adding and removing aspects from the model easier, a process which could theoretically be done at runtime as opposed to at deploy time.

To fetch a set of Timeseries Aspects that fall into a particular time range, you can use the following query template:

curl -X POST 'http://localhost:8080/aspects?action=getTimeseriesAspectValues' \
--data '{
    "urn": "<urn>",
    "entity": "<entity-name>",
    "aspect": "<time-series-aspect-name>",
    "startTimeMillis": "<your-start-time-ms>",
    "endTimeMillis": "<your-end-time-ms>"
}'

For example, to fetch "datasetProfile" timeseries aspects for a dataset with urn urn:li:dataset:(urn:li:dataPlatform:foo,barUp,PROD) that were reported after July 26, 2021 and before July 28, 2021, you could issue the following query:

curl -X POST 'http://localhost:8080/aspects?action=getTimeseriesAspectValues' \
--data '{
    "urn": "urn:li:dataset:(urn:li:dataPlatform:redshift,global_dev.larxynx_carcinoma_data_2020,PROD)",
    "entity": "dataset",
    "aspect": "datasetProfile",
    "startTimeMillis": 1625122800000,
    "endTimeMillis": 1627455600000
}'

{
   "value":{
      "limit":10000,
      "aspectName":"datasetProfile",
      "endTimeMillis":1627455600000,
      "startTimeMillis":1625122800000,
      "entityName":"dataset",
      "values":[
         {
            "aspect":{
               "value":"{\"timestampMillis\":1626912000000,\"fieldProfiles\":[{\"uniqueProportion\":1.0,\"sampleValues\":[\"123MMKK12\",\"13KDFMKML\",\"123NNJJJL\"],\"fieldPath\":\"id\",\"nullCount\":0,\"nullProportion\":0.0,\"uniqueCount\":3742},{\"uniqueProportion\":1.0,\"min\":\"1524406400000\",\"max\":\"1624406400000\",\"sampleValues\":[\"1640023230002\",\"1640343012207\",\"16303412330117\"],\"mean\":\"1555406400000\",\"fieldPath\":\"date\",\"nullCount\":0,\"nullProportion\":0.0,\"uniqueCount\":3742},{\"uniqueProportion\":0.037,\"min\":\"21\",\"median\":\"68\",\"max\":\"92\",\"sampleValues\":[\"45\",\"65\",\"81\"],\"mean\":\"65\",\"distinctValueFrequencies\":[{\"value\":\"12\",\"frequency\":103},{\"value\":\"54\",\"frequency\":12}],\"fieldPath\":\"patient_age\",\"nullCount\":0,\"nullProportion\":0.0,\"uniqueCount\":79},{\"uniqueProportion\":0.00820873786407767,\"sampleValues\":[\"male\",\"female\"],\"fieldPath\":\"patient_gender\",\"nullCount\":120,\"nullProportion\":0.03,\"uniqueCount\":2}],\"rowCount\":3742,\"columnCount\":4}",
               "contentType":"application/json"
            }
         },
      ]
   }
}

You'll notice that in this API (V2), we return a generic serialized aspect string as opposed to an inlined Rest.li-serialized Snapshot Model.

This is part of an initiative to move from MCE + MAE to MetadataChangeProposal and MetadataChangeLog. For more information, see this doc.

Get Relationships (Edges)

To get relationships between entities, you can use the /relationships API. Do do so, you must provide the following inputs:

  1. Urn of the source node
  2. Direction of the edge (INCOMING, OUTGOING)
  3. The name of the Relationship (This can be found in Aspect PDLs within the @Relationship annotation)

For example, to get all entities owned by urn:li:corpuser:fbar, we could issue the following query:

curl 'http://localhost:8080/relationships?direction=INCOMING&urn=urn%3Ali%3Acorpuser%3Auser1&types=OwnedBy'

which will return a list of urns, representing entities on the other side of the relationship:

{
   "entities":[
      urn:li:dataset:(urn:li:dataPlatform:foo,barUp,PROD)
   ]
}

FAQ

1. How do I find the valid set of Entity names?

Entities are named inside of PDL schemas. Each entity will be annotated with the @Entity annotation, which will include a "name" field inside. This represents the "common name" for the entity which can be used in browsing, searching, and more. By default, DataHub ships with the following entities:

By convention, all entity PDLs live under metadata-models/src/main/pegasus/com/linkedin/metadata/snapshot

2. How do I find the valid set of Aspect names?

Aspects are named inside of PDL schemas. Each aspect will be annotated with the @Aspect annotation, which will include a "name" field inside. This represents the "common name" for the entity which can be used in browsing, searching, and more.

By convention, all entity PDLs live under metadata-models/src/main/pegasus/com/linkedin/metadata/common or metadata-models/src/main/pegasus/com/linkedin/metadata/<entity-name>. For example, the dataset-specific aspects are located under metadata-models/src/main/pegasus/com/linkedin/metadata/dataset.

3. How do I find the valid set of Relationship names?

All relationships are defined on foreign-key fields inside Aspect PDLs. They are reflected by fields bearing the @Relationship annotation. Inside this annotation is a "name" field that defines the standardized name of the Relationship to be used when querying.

By convention, all entity PDLs live under metadata-models/src/main/pegasus/com/linkedin/metadata/common or metadata-models/src/main/pegasus/com/linkedin/metadata/<entity-name>. For example, the dataset-specific aspects are located under metadata-models/src/main/pegasus/com/linkedin/metadata/dataset.