Update search-onboarding.md

This commit is contained in:
Jyoti Wadhwani 2020-03-11 15:13:57 -07:00 committed by GitHub
parent fba5cd8c5e
commit da93c4855d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -12,15 +12,15 @@ This is because we want to support partial updates to search documents.
## 2. Create the search index, define its mappings and settings
A [mapping] is created using the information of search document model.
It defines how a document, and the fields it contains, are stored and indexed by various [tokenizer]s, [analyzer]s and data type for the fields.
It defines how a document, and the fields it contains, are stored and indexed by various [tokenizers], [analyzers] and data type for the fields.
For certain fields, sub-fields are created using different analyzers.
The analyzers are chosen depending on the needs for each field.
This is currently created manually using [curl] commands, and we plan to [automate](../what/search-index.md#search-automation-tbd) the process in the near future.
Check index [mappings & settings](../../docker/elasticsearch/dataset-index-config.json) for `dataset` search index.
## 3. Ingestion into search index
The actual indexing process for each [entity] is powered by [index builder]s.
The builders register the metadata [aspect]s of their interest against [MAE Consumer Job] and will be invoked whenever an [MAE] of same interest is received.
The actual indexing process for each [entity] is powered by [index builders].
The builders register the metadata [aspects] of their interest against [MAE Consumer Job] and will be invoked whenever an [MAE] of same interest is received.
Index builders should be extended from [BaseIndexBuilder]. Check [DatasetIndexBuilder] as an example.
For the consumer job to consume those MAEs, you should add your index builder to the [index builder registry].
@ -55,6 +55,7 @@ public abstract class BaseSearchConfig<DOCUMENT extends RecordTemplate> {
```
[DatasetSearchConfig] is the implementation of search config for `dataset` entity.
Search query templates for datasets and users can be found [here]https://github.com/linkedin/datahub/tree/master/gms/impl/src/main/resources.
## 5. Add search query endpoints to GMS
Finally, you need to create [rest.li](https://rest.li) APIs to serve your search queries.
@ -82,4 +83,4 @@ Refer to [CorpUsers] rest.li resource implementation as an example.
[basesearchconfig]: ../../metadata-dao-impl/elasticsearch-dao/src/main/java/com/linkedin/metadata/dao/search/BaseSearchConfig.java
[datasetsearchconfig]: ../../gms/impl/src/main/java/com/linkedin/dataset/dao/search/DatasetSearchConfig.java
[basesearchableentityresource]: ../../metadata-restli-resource/src/main/java/com/linkedin/metadata/restli/BaseSearchableEntityResource.java
[corpusers]: ../../gms/impl/src/main/java/com/linkedin/identity/rest/resources/CorpUsers.java
[corpusers]: ../../gms/impl/src/main/java/com/linkedin/identity/rest/resources/CorpUsers.java