datahub/docs/architecture/metadata-ingestion.md

64 lines
2.6 KiB
Markdown
Raw Normal View History

# Metadata Ingestion Architecture
2019-12-19 17:23:48 -08:00
## MCE Consumer Job
## MAE Consumer Job
All the emitted [MAE] will be consumed by a Kafka streams job, [mae-consumer-job], which updates the [graph] and [search index] accordingly.
The job itself is entity-agnostic and will execute corresponding graph & search index builders, which will be invoked by the job when a specific metadata aspect is changed.
The builder should instruct the job how to update the graph and search index based on the metadata change.
The builder can optionally use [Remote DAO] to fetch additional metadata from other sources to help compute the final update.
To ensure that metadata changes are processed in the correct chronological order,
MAEs are keyed by the entity [URN] — meaning all MAEs for a particular entity will be processed sequentially by a single Kafka streams thread.
## Search and Graph Index Builders
As described in [Metadata Modelling] section, [Entity], [Relationship], and [Search Document] models do not directly encode the logic of how each field should be derived from metadata.
Instead, this logic should be provided in the form of a graph or search index builder.
The builders register the metadata [aspect]s of their interest against [MAE Consumer Job](#mae-consumer-job) and will be invoked whenever a MAE involving the corresponding aspect is received.
If the MAE itself doesnt contain all the metadata needed, builders can use Remote DAO to fetch from GMS directly.
```java
public abstract class BaseIndexBuilder<DOCUMENT extends RecordTemplate> {
BaseIndexBuilder(@Nonnull List<Class<? extends RecordTemplate>> snapshotsInterested);
@Nullable
public abstract List<DOCUMENT> getDocumentsToUpdate(@Nonnull RecordTemplate snapshot);
@Nonnull
public abstract Class<DOCUMENT> getDocumentType();
}
```
```java
public interface GraphBuilder<SNAPSHOT extends RecordTemplate> {
GraphUpdates build(SNAPSHOT snapshot);
@Value
class GraphUpdates {
List<? extends RecordTemplate> entities;
List<RelationshipUpdates> relationshipUpdates;
}
@Value
class RelationshipUpdates {
List<? extends RecordTemplate> relationships;
BaseGraphWriterDAO.RemovalOption preUpdateOperation;
}
}
```
[MAE]: ../what/mxe.md#metadata-audit-event-mae
[graph]: ../what/graph.md
[search index]: ../what/search-index.md
[mae-consumer-job]: ../../metadata-jobs/mae-consumer-job
[Remote DAO]: ../architecture/metadata-serving.md#remote-dao
[URN]: ../what/urn.md
[Metadata Modelling]: ../how/metadata-modelling.md
[Entity]: ../what/entity.md
[Relationship]: ../what/relationship.md
[Search Document]: ../what/search-document.md
[Aspect]: ../what/aspect.md