# Onboarding to GMA Search - searching over a new field If you need to onboard a new entity to search, refer to [How to onboard to GMA Search](./search-onboarding.md). For this exercise, we'll add a new field to an existing aspect of corp users and search over this field. Your use case might require searching over an existing field of an aspect or create a brand new aspect and search over it's field(s). For such use cases, similar steps should be followed. ## 1. Add field to aspect (skip this step if the field already exists in an aspect) For this example, we will add new field `courses` to [CorpUserEditableInfo](../../metadata-models/src/main/pegasus/com/linkedin/identity/CorpUserEditableInfo.pdl) which is an aspect of corp user entity. ``` namespace com.linkedin.identity /** * Linkedin corp user information that can be edited from UI */ @Aspect.EntityUrns = [ "com.linkedin.common.CorpuserUrn" ] record CorpUserEditableInfo { ... /** * Courses that the user has taken e.g. AI200: Introduction to Artificial Intelligence */ courses: array[string] = [ ] } ``` ## 2. Add field to search document model For this example, we will add field `courses` to [CorpUserInfoDocument.pdl](../../metadata-models/src/main/pegasus/com/linkedin/metadata/search/CorpUserInfoDocument.pdl) which is the search document model for corp user entity. ``` namespace com.linkedin.metadata.search /** * Data model for CorpUserInfo entity search */ record CorpUserInfoDocument includes BaseDocument { ... /** * Courses that the user has taken e.g. AI200: Introduction to Artificial Intelligence */ courses: optional array[string] } ``` ## 3. Modify the mapping of search index Now, we will modify the mapping of corp user search index. Use the following Elasticsearch command to add new field to an existing index. ```json curl http://localhost:8080/corpuserinfodocument/doc/_mapping? --data ' { "properties": { "courses": { "type": "text } } }' ``` ## 4. Modify index config, so that the new mapping is picked up next time If you want corp user search index to contain this new field `courses` next time docker containers are brought up, we need to add this field to [corpuser-index-config.json](../../docker/elasticsearch-setup/corpuser-index-config.json). ``` { "settings": { "index": { "analysis": { ... } } }, "mappings": { "doc": { "properties": { ... "courses": { "type": "text" } } } } } ``` Choose your analyzer wisely. For this example, we store the field `courses` as an array of string and hence use `text` data type. Default analyzer is `standard` and it provides grammar based tokenization. ## 5. Update the index builder logic Index builder is where the logic to transform an aspect to search document model is defined. For this example, we will add the logic in [CorpUserInfoIndexBuilder](../../metadata-builders/src/main/java/com/linkedin/metadata/builders/search/CorpUserInfoIndexBuilder.java). ```java package com.linkedin.metadata.builders.search; @Slf4j public class CorpUserInfoIndexBuilder extends BaseIndexBuilder { public CorpUserInfoIndexBuilder() { super(Collections.singletonList(CorpUserSnapshot.class), CorpUserInfoDocument.class); } ... @Nonnull private CorpUserInfoDocument getDocumentToUpdateFromAspect(@Nonnull CorpuserUrn urn, @Nonnull CorpUserEditableInfo corpUserEditableInfo) { final String aboutMe = corpUserEditableInfo.getAboutMe() == null ? "" : corpUserEditableInfo.getAboutMe(); return new CorpUserInfoDocument() .setUrn(urn) .setAboutMe(aboutMe) .setTeams(corpUserEditableInfo.getTeams()) .setSkills(corpUserEditableInfo.getSkills()) .setCourses(corpUserEditableInfo.getCourses()); } ... } ``` ## 6: Update search query template, to start searching over the new field For this example, we will modify [corpUserESSearchQueryTemplate.json](../../gms/impl/src/main/resources/corpUserESSearchQueryTemplate.json) to start searching over the field `courses`. Here is an example. ```json { "function_score": { "query": { "query_string": { "query": "$INPUT", "fields": [ "fullName^4", "ldap^2", "managerLdap", "skills", "courses" "teams", "title" ], "default_operator": "and", "analyzer": "standard" } }, "functions": [ { "filter": { "term": { "active": true } }, "weight": 2 } ], "score_mode": "multiply" } } ``` As you can see in the above query template, corp user search is performed across multiple fields, to which the field `courses` has been added. ## 7: Test your changes Make sure relevant docker containers are rebuilt before testing the changes. If this is a new field that has been added to an existing snapshot, then you can test by ingesting data that contains this new field. Here is an example of ingesting to `/corpUsers` endpoint, with the new field `courses`. ``` curl 'http://localhost:8080/corpUsers?action=ingest' -X POST -H 'X-RestLi-Protocol-Version:2.0.0' --data ' { "snapshot": { "aspects": [ { "com.linkedin.identity.CorpUserEditableInfo": { "courses": [ "Docker for Data Scientists", "AI100: Introduction to Artificial Intelligence" ], "skills": [ ], "pictureLink": "https://raw.githubusercontent.com/linkedin/datahub/master/datahub-web/packages/data-portal/public/assets/images/default_avatar.png", "teams": [ ] } } ], "urn": "urn:li:corpuser:datahub" } }' ``` Once the ingestion is done, you can test your changes by issuing search queries. Here is an example query with response. ``` curl "http://localhost:8080/corpUsers?q=search&input=ai200" -H 'X-RestLi-Protocol-Version: 2.0.0' -s | jq Response: { "metadata": { "urns": [ "urn:li:corpuser:datahub" ], "searchResultMetadatas": [ ] }, "elements": [ { "editableInfo": { "skills": [ ], "courses": [ "Docker for Data Scientists", "AI100: Introduction to Artificial Intelligence" ], "pictureLink": "https://raw.githubusercontent.com/linkedin/datahub/master/datahub-web/packages/data-portal/public/assets/images/default_avatar.png", "teams": [ ] }, "username": "datahub", "info": { "active": true, "fullName": "Data Hub", "title": "CEO", "displayName": "Data Hub", "email": "datahub@linkedin.com" } } ], "paging": { "count": 10, "start": 0, "total": 1, "links": [ ] } } ```