mirror of
https://github.com/datahub-project/datahub.git
synced 2025-12-28 02:17:53 +00:00
Update documentation
This commit is contained in:
parent
8bfb086e09
commit
6716e38279
@ -24,6 +24,6 @@ as username and password.
|
||||
* [Metadata Ingestion](metadata-ingestion)
|
||||
|
||||
## Roadmap
|
||||
1. [Neo4J](http://neo4j.com) graph query support
|
||||
2. User profile page
|
||||
1. Add [Neo4J](http://neo4j.com) graph query support
|
||||
2. Add user profile page
|
||||
3. Deploy Data Hub to [Azure Cloud](https://azure.microsoft.com/en-us/)
|
||||
@ -6,7 +6,7 @@ responsibility of this service for the Data Hub.
|
||||
|
||||
## Build
|
||||
```
|
||||
docker image build -t keremsahin/datahub-frontend -f docker/datahub-frontend/Dockerfile .
|
||||
docker image build -t keremsahin/datahub-frontend -f docker/frontend/Dockerfile .
|
||||
```
|
||||
This command will build and deploy the image in your local store.
|
||||
|
||||
|
||||
@ -1,9 +1,31 @@
|
||||
# Quickstart
|
||||
# Data Hub Quickstart
|
||||
To start all Docker containers at once, please run below command:
|
||||
```
|
||||
cd docker/quickstart && docker-compose up
|
||||
```
|
||||
After `elasticsearch` container is initialized, run below to create the search indices:
|
||||
After containers are initialized, we need to create the `dataset` and `users` search indices by running below command:
|
||||
```
|
||||
cd docker/elasticsearch && bash init.sh
|
||||
```
|
||||
```
|
||||
At this point, all containers are ready and Data Hub can be considered up and running. Check specific containers guide
|
||||
for details:
|
||||
* [Elasticsearch & Kibana](../elasticsearch)
|
||||
* [Data Hub Frontend](../frontend)
|
||||
* [Data Hub GMS](../gms)
|
||||
* [Kafka, Schema Registry & Zookeeper](../kafka)
|
||||
* [Data Hub MAE Consumer](../mae-consumer)
|
||||
* [Data Hub MCE Consumer](../mce-consumer)
|
||||
* [MySQL](../mysql)
|
||||
|
||||
From this point on, if you want to be able to sign in to Data Hub and see some sample data, please see
|
||||
[Metadata Ingestion Guide](../../metadata-ingestion) for `bootstrapping Data Hub`.
|
||||
|
||||
## Debugging Containers
|
||||
If you want to debug containers, you can check container logs:
|
||||
```
|
||||
docker logs <<container_name>>
|
||||
```
|
||||
Also, you can connect to container shell for further debugging:
|
||||
```
|
||||
docker exec -it <<container_name>> bash
|
||||
```
|
||||
|
||||
301
gms/README.md
301
gms/README.md
@ -1,57 +1,304 @@
|
||||
# Data Hub Generalized Metadata Store (GMS)
|
||||
Data Hub GMS is a [Rest.li](https://linkedin.github.io/rest.li/) service written in Java. It is following common
|
||||
Rest.li server development practices and all data models are Pegasus(.pdsc) models.
|
||||
|
||||
## Starting GMS
|
||||
## Pre-requisites
|
||||
* You need to have [JDK8](https://www.oracle.com/java/technologies/jdk8-downloads.html)
|
||||
installed on your machine to be able to build `Data Hub GMS`.
|
||||
|
||||
## Build
|
||||
`Data Hub GMS` is already built as part of top level build:
|
||||
```
|
||||
./gradlew build && ./gradlew :gms:war:JettyRunWar
|
||||
./gradlew build
|
||||
```
|
||||
However, if you only want to build `Data Hub GMS` specifically:
|
||||
```
|
||||
./gradlew :gms:war:build
|
||||
```
|
||||
|
||||
### Example GMS Curl Calls
|
||||
## Dependencies
|
||||
Before starting `Data Hub GMS`, you need to make sure that [Kafka, Schema Registry & Zookeeper](../docker/kafka),
|
||||
[Elasticsearch](../docker/elasticsearch) and [MySQL](../docker/mysql) Docker containers are up and running.
|
||||
|
||||
#### Create
|
||||
## Start via Docker image
|
||||
Quickest way to try out `Data Hub GMS` is running the [Docker image](../docker/gms).
|
||||
|
||||
## Start via command line
|
||||
If you do modify things and want to try it out quickly without building the Docker image, you can also run
|
||||
the application directly from command line after a successful [build](#build):
|
||||
```
|
||||
curl 'http://localhost:8080/corpUsers/($params:(),name:fbar)/snapshot' -X POST -H 'X-RestLi-Method: create' -H 'X-RestLi-Protocol-Version:2.0.0' --data '{"aspects": [{"com.linkedin.identity.CorpUserInfo":{"active": true, "fullName": "Foo Bar", "email": "fbar@linkedin.com"}}, {"com.linkedin.identity.CorpUserEditableInfo":{}}], "urn": "urn:li:corpuser:fbar"}' -v
|
||||
curl 'http://localhost:8080/datasets/($params:(),name:x.y,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/snapshot' -X POST -H 'X-RestLi-Method: create' -H 'X-RestLi-Protocol-Version:2.0.0' --data '{"aspects":[{"com.linkedin.common.Ownership":{"owners":[{"owner":"urn:li:corpuser:ksahin","type":"DATAOWNER"}],"lastModified":{"time":0,"actor":"urn:li:corpuser:ksahin"}}},{"com.linkedin.dataset.UpstreamLineage":{"upstreams":[{"auditStamp":{"time":0,"actor":"urn:li:corpuser:ksahin"},"dataset":"urn:li:dataset:(urn:li:dataPlatform:foo,barUp,PROD)","type":"TRANSFORMED"}]}},{"com.linkedin.common.InstitutionalMemory":{"elements":[{"url":"https://www.linkedin.com","description":"Sample doc","createStamp":{"time":0,"actor":"urn:li:corpuser:ksahin"}}]}},{"com.linkedin.schema.SchemaMetadata":{"schemaName":"FooEvent","platform":"urn:li:dataPlatform:foo","version":0,"created":{"time":0,"actor":"urn:li:corpuser:ksahin"},"lastModified":{"time":0,"actor":"urn:li:corpuser:ksahin"},"hash":"","platformSchema":{"com.linkedin.schema.KafkaSchema":{"documentSchema":"{\"type\":\"record\",\"name\":\"MetadataChangeEvent\",\"namespace\":\"com.linkedin.mxe\",\"doc\":\"Kafka event for proposing a metadata change for an entity.\",\"fields\":[{\"name\":\"auditHeader\",\"type\":{\"type\":\"record\",\"name\":\"KafkaAuditHeader\",\"namespace\":\"com.linkedin.avro2pegasus.events\",\"doc\":\"Header\"}}]}"}},"fields":[{"fieldPath":"foo","description":"Bar","nativeDataType":"string","type":{"type":{"com.linkedin.schema.StringType":{}}}}]}}],"urn":"urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)"}' -v
|
||||
./gradlew :gms:war:JettyRunWar
|
||||
```
|
||||
|
||||
#### Get
|
||||
## Sample API Calls
|
||||
|
||||
### Create user
|
||||
```
|
||||
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' 'http://localhost:8080/corpUsers/($params:(),name:fbar)/snapshot/($params:(),aspectVersions:List((aspect:com.linkedin.identity.CorpUserInfo,version:0)))' | jq
|
||||
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' 'http://localhost:8080/datasets/($params:(),name:x.y,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/snapshot/($params:(),aspectVersions:List((aspect:com.linkedin.common.Ownership,version:0)))' | jq
|
||||
➜ curl 'http://localhost:8080/corpUsers/($params:(),name:fbar)/snapshot' -X POST -H 'X-RestLi-Method: create' -H 'X-RestLi-Protocol-Version:2.0.0' --data '{"aspects": [{"com.linkedin.identity.CorpUserInfo":{"active": true, "displayName": "Foo Bar", "fullName": "Foo Bar", "email": "fbar@linkedin.com"}}, {"com.linkedin.identity.CorpUserEditableInfo":{}}], "urn": "urn:li:corpuser:fbar"}' -v
|
||||
```
|
||||
|
||||
### Get all
|
||||
### Create dataset
|
||||
```
|
||||
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get_all' 'http://localhost:8080/corpUsers' | jq
|
||||
➜ curl 'http://localhost:8080/datasets/($params:(),name:bar,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/snapshot' -X POST -H 'X-RestLi-Method: create' -H 'X-RestLi-Protocol-Version:2.0.0' --data '{"aspects":[{"com.linkedin.common.Ownership":{"owners":[{"owner":"urn:li:corpuser:fbar","type":"DATAOWNER"}],"lastModified":{"time":0,"actor":"urn:li:corpuser:fbar"}}},{"com.linkedin.dataset.UpstreamLineage":{"upstreams":[{"auditStamp":{"time":0,"actor":"urn:li:corpuser:fbar"},"dataset":"urn:li:dataset:(urn:li:dataPlatform:foo,barUp,PROD)","type":"TRANSFORMED"}]}},{"com.linkedin.common.InstitutionalMemory":{"elements":[{"url":"https://www.linkedin.com","description":"Sample doc","createStamp":{"time":0,"actor":"urn:li:corpuser:fbar"}}]}},{"com.linkedin.schema.SchemaMetadata":{"schemaName":"FooEvent","platform":"urn:li:dataPlatform:foo","version":0,"created":{"time":0,"actor":"urn:li:corpuser:fbar"},"lastModified":{"time":0,"actor":"urn:li:corpuser:fbar"},"hash":"","platformSchema":{"com.linkedin.schema.KafkaSchema":{"documentSchema":"{\"type\":\"record\",\"name\":\"MetadataChangeEvent\",\"namespace\":\"com.linkedin.mxe\",\"doc\":\"Kafka event for proposing a metadata change for an entity.\",\"fields\":[{\"name\":\"auditHeader\",\"type\":{\"type\":\"record\",\"name\":\"KafkaAuditHeader\",\"namespace\":\"com.linkedin.avro2pegasus.events\",\"doc\":\"Header\"}}]}"}},"fields":[{"fieldPath":"foo","description":"Bar","nativeDataType":"string","type":{"type":{"com.linkedin.schema.StringType":{}}}}]}}],"urn":"urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)"}' -v
|
||||
```
|
||||
|
||||
### Browse
|
||||
|
||||
### Get user
|
||||
```
|
||||
curl "http://localhost:8080/datasets?action=browse" -d '{"path": "", "start": 0, "limit": 10}' -X POST -H 'X-RestLi-Protocol-Version: 2.0.0' | jq
|
||||
➜ curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' 'http://localhost:8080/corpUsers/($params:(),name:fbar)/snapshot/($params:(),aspectVersions:List((aspect:com.linkedin.identity.CorpUserInfo,version:0)))' | jq
|
||||
{
|
||||
"urn": "urn:li:corpuser:fbar",
|
||||
"aspects": [
|
||||
{
|
||||
"com.linkedin.identity.CorpUserInfo": {
|
||||
"displayName": "Foo Bar",
|
||||
"active": true,
|
||||
"fullName": "Foo Bar",
|
||||
"email": "fbar@linkedin.com"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Search
|
||||
|
||||
### Get dataset
|
||||
```
|
||||
curl "http://localhost:8080/corpUsers?q=search&input=foo&" -X GET -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'X-RestLi-Method: finder' | jq
|
||||
curl "http://localhost:8080/datasets?q=search&input=foo&" -X GET -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'X-RestLi-Method: finder' | jq
|
||||
➜ curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' 'http://localhost:8080/datasets/($params:(),name:bar,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/snapshot/($params:(),aspectVersions:List((aspect:com.linkedin.common.Ownership,version:0)))' | jq
|
||||
{
|
||||
"urn": "urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)",
|
||||
"aspects": [
|
||||
{
|
||||
"com.linkedin.common.Ownership": {
|
||||
"owners": [
|
||||
{
|
||||
"owner": "urn:li:corpuser:fbar",
|
||||
"type": "DATAOWNER"
|
||||
},
|
||||
{
|
||||
"owner": "urn:li:corpuser:ksahin",
|
||||
"type": "DATAOWNER"
|
||||
}
|
||||
],
|
||||
"lastModified": {
|
||||
"actor": "urn:li:corpuser:ksahin",
|
||||
"time": 1568015476480
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Autocomplete
|
||||
|
||||
### Get all users
|
||||
```
|
||||
curl "http://localhost:8080/datasets?action=autocomplete" -d '{"query": "foo", "field": "name", "limit": 10}' -X POST -H 'X-RestLi-Protocol-Version: 2.0.0' | jq
|
||||
➜ curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get_all' 'http://localhost:8080/corpUsers' | jq
|
||||
{
|
||||
"elements": [
|
||||
{
|
||||
"editableInfo": {},
|
||||
"username": "fbar",
|
||||
"info": {
|
||||
"displayName": "Foo Bar",
|
||||
"active": true,
|
||||
"fullName": "Foo Bar",
|
||||
"email": "fbar@linkedin.com"
|
||||
}
|
||||
},
|
||||
{
|
||||
"editableInfo": {
|
||||
"skills": [],
|
||||
"teams": [],
|
||||
"pictureLink": "https://content.linkedin.com/content/dam/me/business/en-us/amp/brand-site/v2/bg/LI-Bug.svg.original.svg"
|
||||
},
|
||||
"username": "ksahin",
|
||||
"info": {
|
||||
"displayName": "Kerem Sahin",
|
||||
"active": true,
|
||||
"fullName": "Kerem Sahin",
|
||||
"email": "ksahin@linkedin.com"
|
||||
}
|
||||
},
|
||||
{
|
||||
"editableInfo": {
|
||||
"skills": [],
|
||||
"teams": [],
|
||||
"pictureLink": "https://content.linkedin.com/content/dam/me/business/en-us/amp/brand-site/v2/bg/LI-Bug.svg.original.svg"
|
||||
},
|
||||
"username": "datahub",
|
||||
"info": {
|
||||
"displayName": "Data Hub",
|
||||
"active": true,
|
||||
"fullName": "Data Hub",
|
||||
"email": "datahub@linkedin.com"
|
||||
}
|
||||
}
|
||||
],
|
||||
"paging": {
|
||||
"count": 10,
|
||||
"start": 0,
|
||||
"links": []
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Ownership
|
||||
|
||||
### Browse datasets
|
||||
```
|
||||
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' 'http://localhost:8080/datasets/($params:(),name:x.y,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/rawOwnership/0' | jq
|
||||
➜ curl "http://localhost:8080/datasets?action=browse" -d '{"path": "", "start": 0, "limit": 10}' -X POST -H 'X-RestLi-Protocol-Version: 2.0.0' | jq
|
||||
{
|
||||
"value": {
|
||||
"numEntities": 0,
|
||||
"metadata": {
|
||||
"totalNumEntities": 2,
|
||||
"path": "",
|
||||
"groups": [
|
||||
{
|
||||
"name": "prod",
|
||||
"count": 2
|
||||
}
|
||||
]
|
||||
},
|
||||
"entities": [],
|
||||
"pageSize": 10,
|
||||
"from": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Schema
|
||||
|
||||
### Search users
|
||||
```
|
||||
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' 'http://localhost:8080/datasets/($params:(),name:x.y,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/schema/0' | jq
|
||||
➜ curl "http://localhost:8080/corpUsers?q=search&input=foo&" -X GET -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'X-RestLi-Method: finder' | jq
|
||||
{
|
||||
"metadata": {
|
||||
"searchResultMetadatas": [
|
||||
{
|
||||
"name": "title",
|
||||
"aggregations": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"elements": [
|
||||
{
|
||||
"editableInfo": {},
|
||||
"username": "fbar",
|
||||
"info": {
|
||||
"displayName": "Foo Bar",
|
||||
"active": true,
|
||||
"fullName": "Foo Bar",
|
||||
"email": "fbar@linkedin.com"
|
||||
}
|
||||
}
|
||||
],
|
||||
"paging": {
|
||||
"total": 1,
|
||||
"count": 10,
|
||||
"start": 0,
|
||||
"links": []
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Search datasets
|
||||
```
|
||||
➜ curl "http://localhost:8080/datasets?q=search&input=bar" -X GET -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'X-RestLi-Method: finder' | jq
|
||||
{
|
||||
"metadata": {
|
||||
"searchResultMetadatas": [
|
||||
{
|
||||
"name": "platform",
|
||||
"aggregations": {
|
||||
"foo": 1
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "origin",
|
||||
"aggregations": {
|
||||
"prod": 1
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
"elements": [
|
||||
{
|
||||
"urn": "urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)",
|
||||
"origin": "PROD",
|
||||
"name": "bar",
|
||||
"platform": "urn:li:dataPlatform:foo"
|
||||
}
|
||||
],
|
||||
"paging": {
|
||||
"total": 1,
|
||||
"count": 10,
|
||||
"start": 0,
|
||||
"links": []
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Typeahead for datasets
|
||||
```
|
||||
➜ curl "http://localhost:8080/datasets?action=autocomplete" -d '{"query": "bar", "field": "name", "limit": 10}' -X POST -H 'X-RestLi-Protocol-Version: 2.0.0' | jq
|
||||
{
|
||||
"value": {
|
||||
"query": "bar",
|
||||
"suggestions": [
|
||||
"bar"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Get dataset ownership
|
||||
```
|
||||
➜ curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' 'http://localhost:8080/datasets/($params:(),name:bar,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/rawOwnership/0' | jq
|
||||
{
|
||||
"owners": [
|
||||
{
|
||||
"owner": "urn:li:corpuser:fbar",
|
||||
"type": "DATAOWNER"
|
||||
},
|
||||
{
|
||||
"owner": "urn:li:corpuser:ksahin",
|
||||
"type": "DATAOWNER"
|
||||
}
|
||||
],
|
||||
"lastModified": {
|
||||
"actor": "urn:li:corpuser:ksahin",
|
||||
"time": 1568015476480
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Get dataset schema
|
||||
```
|
||||
➜ curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' 'http://localhost:8080/datasets/($params:(),name:bar,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/schema/0' | jq
|
||||
{
|
||||
"created": {
|
||||
"actor": "urn:li:corpuser:fbar",
|
||||
"time": 0
|
||||
},
|
||||
"platformSchema": {
|
||||
"com.linkedin.schema.KafkaSchema": {
|
||||
"documentSchema": "{\"type\":\"record\",\"name\":\"MetadataChangeEvent\",\"namespace\":\"com.linkedin.mxe\",\"doc\":\"Kafka event for proposing a metadata change for an entity.\",\"fields\":[{\"name\":\"auditHeader\",\"type\":{\"type\":\"record\",\"name\":\"KafkaAuditHeader\",\"namespace\":\"com.linkedin.avro2pegasus.events\",\"doc\":\"Header\"}}]}"
|
||||
}
|
||||
},
|
||||
"lastModified": {
|
||||
"actor": "urn:li:corpuser:fbar",
|
||||
"time": 0
|
||||
},
|
||||
"schemaName": "FooEvent",
|
||||
"fields": [
|
||||
{
|
||||
"fieldPath": "foo",
|
||||
"description": "Bar",
|
||||
"type": {
|
||||
"type": {
|
||||
"com.linkedin.schema.StringType": {}
|
||||
}
|
||||
},
|
||||
"nativeDataType": "string"
|
||||
}
|
||||
],
|
||||
"version": 0,
|
||||
"platform": "urn:li:dataPlatform:foo",
|
||||
"hash": ""
|
||||
}
|
||||
```
|
||||
@ -0,0 +1,10 @@
|
||||
# MXE Consumer Jobs
|
||||
Data Hub uses Kafka as the pub-sub message queue in the backend. There are 2 Kafka topics used by Data Hub which are
|
||||
`MetadataChangeEvent` and `MetadataAuditEvent`.
|
||||
* `MetadataChangeEvent:` This message is emitted by any data platform or crawler in which there is a change in the metadata.
|
||||
* `MetadataAuditEvent:` This message is emitted by [Data Hub GMS](../gms) to notify that metadata change is registered.
|
||||
|
||||
To be able to consume from these two topics, there are two [Kafka Streams](https://kafka.apache.org/documentation/streams/)
|
||||
jobs Data Hub uses:
|
||||
* [MCE Consumer Job](mce-consumer-job): Writes to [Data Hub GMS](../gms)
|
||||
* [MAE Consumer Job](elasticsearch-index-job): Writes to [Elasticsearch](../docker/elasticsearch)
|
||||
@ -1,17 +1,33 @@
|
||||
# MetadataAuditEvent (MAE) Consumer Job
|
||||
MAE Consumer is a [Kafka Streams](https://kafka.apache.org/documentation/streams/) job. Its main function is to listen
|
||||
`MetadataAuditEvent` Kafka topic for messages and process those messages using [index builders](../../metadata-builders).
|
||||
Index builders create search document model by processing MAE and then these documents are indexed into Elasticsearch.
|
||||
So, this job is providing us a near-realtime search index update.
|
||||
|
||||
## Starting job
|
||||
Run below to start Elasticsearch indexing job.
|
||||
## Pre-requisites
|
||||
* You need to have [JDK8](https://www.oracle.com/java/technologies/jdk8-downloads.html)
|
||||
installed on your machine to be able to build `Data Hub GMS`.
|
||||
|
||||
## Build
|
||||
`MAE Consumer Job` is already built as part of top level build:
|
||||
```
|
||||
./gradlew build
|
||||
```
|
||||
However, if you only want to build `MAE Consumer Job` specifically:
|
||||
```
|
||||
./gradlew :metadata-jobs:elasticsearch-index-job:build
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
Before starting `MAE Consumer Job`, you need to make sure that [Kafka, Schema Registry & Zookeeper](../../docker/kafka) and
|
||||
[Elasticsearch](../../docker/elasticsearch) Docker containers are up and running.
|
||||
|
||||
## Start via Docker image
|
||||
Quickest way to try out `MAE Consumer Job` is running the [Docker image](../../docker/mae-consumer).
|
||||
|
||||
## Start via command line
|
||||
If you do modify things and want to try it out quickly without building the Docker image, you can also run
|
||||
the application directly from command line after a successful [build](#build):
|
||||
```
|
||||
./gradlew :metadata-jobs:elasticsearch-index-job:run
|
||||
```
|
||||
To test the job, you should've already started Kafka, GMS, MySQL and ElasticSearch/Kibana.
|
||||
After starting all the services, you can create a record in GMS by Snapshot endpoint as below.
|
||||
```
|
||||
curl 'http://localhost:8080/metrics/($params:(),name:a.b.c01,type:UMP)/snapshot' -X POST -H 'X-RestLi-Method: create' -H 'X-RestLi-Protocol-Version:2.0.0' --data '{"aspects": [{"com.linkedin.common.Ownership":{"owners":[{"owner":"urn:li:corpuser:ksahin","type":"DATAOWNER"}]}}], "urn": "urn:li:metric:(UMP,a.b.c01)"}' -v
|
||||
```
|
||||
This will fire an MAE and search index will be updated by indexing job after reading MAE from Kafka.
|
||||
Then, you can check ES index if document is populated by below command.
|
||||
```
|
||||
curl localhost:9200/metricdocument/_search -d '{"query":{"match":{"urn":"urn:li:metric:(UMP,a.b.c01)"}}}' | jq
|
||||
```
|
||||
@ -1,14 +1,33 @@
|
||||
# MetadataChangeEvent (MCE) Consumer Job
|
||||
# MetadataChangeEvent (MAE) Consumer Job
|
||||
MCE Consumer is a [Kafka Streams](https://kafka.apache.org/documentation/streams/) job. Its main function is to listen
|
||||
`MetadataChangeEvent` Kafka topic for messages and process those messages and writes new metadata to `Data Hub GMS`.
|
||||
After every successful update of metadata, GMS fires a `MetadataAuditEvent` and this is consumed by
|
||||
[MAE Consumer Job](../elasticsearch-index-job).
|
||||
|
||||
## Starting job
|
||||
Run below to start MCE consuming job.
|
||||
## Pre-requisites
|
||||
* You need to have [JDK8](https://www.oracle.com/java/technologies/jdk8-downloads.html)
|
||||
installed on your machine to be able to build `Data Hub GMS`.
|
||||
|
||||
## Build
|
||||
`MCE Consumer Job` is already built as part of top level build:
|
||||
```
|
||||
./gradlew build
|
||||
```
|
||||
However, if you only want to build `MCE Consumer Job` specifically:
|
||||
```
|
||||
./gradlew :metadata-jobs:mce-consumer-job:build
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
Before starting `MCE Consumer Job`, you need to make sure that [Kafka, Schema Registry & Zookeeper](../../docker/kafka) and
|
||||
[Data Hub GMS](../../docker/gms) Docker containers are up and running.
|
||||
|
||||
## Start via Docker image
|
||||
Quickest way to try out `MCE Consumer Job` is running the [Docker image](../../docker/mce-consumer).
|
||||
|
||||
## Start via command line
|
||||
If you do modify things and want to try it out quickly without building the Docker image, you can also run
|
||||
the application directly from command line after a successful [build](#build):
|
||||
```
|
||||
./gradlew :metadata-jobs:mce-consumer-job:run
|
||||
```
|
||||
Create your own MCE to align the models in bootstrap_mce.dat.
|
||||
Tips: one line per MCE with Python syntax.
|
||||
|
||||
Then you can produce MCE to feed your GMS.
|
||||
```
|
||||
cd metadata-ingestion && python mce_cli.py produce
|
||||
```
|
||||
Loading…
x
Reference in New Issue
Block a user