# Onboarding to GMA Search - searching over a new field

If you need to onboard a new entity to search, refer to [How to onboard to GMA Search](./search-onboarding.md).

For this exercise, we'll add a new field to an existing aspect of corp users and search over this field. Your use case might require searching over an existing field of an aspect or create a brand new aspect and search over it's field(s). For such use cases, similar steps should be followed.

## 1. Add field to aspect (skip this step if the field already exists in an aspect)
For this example, we will add new field `courses` to [CorpUserEditableInfo](../../metadata-models/src/main/pegasus/com/linkedin/identity/CorpUserEditableInfo.pdl) which is an aspect of corp user entity.
```
namespace com.linkedin.identity

/**
 * Linkedin corp user information that can be edited from UI
 */
@Aspect.EntityUrns = [ "com.linkedin.common.CorpuserUrn" ]
record CorpUserEditableInfo {

  ...
  
  /**
   * Courses that the user has taken e.g. AI200: Introduction to Artificial Intelligence
   */
  courses: array[string] = [ ]
  
}
```

## 2. Add field to search document model
For this example, we will add field `courses` to [CorpUserInfoDocument.pdl](../../metadata-models/src/main/pegasus/com/linkedin/metadata/search/CorpUserInfoDocument.pdl) which is the search document model for corp user entity.

```
namespace com.linkedin.metadata.search

/**
 * Data model for CorpUserInfo entity search
 */
record CorpUserInfoDocument includes BaseDocument {

  ...

  /**
   * Courses that the user has taken e.g. AI200: Introduction to Artificial Intelligence
   */
  courses: optional array[string]
  
}
```

## 3. Modify the mapping of search index
Now, we will modify the mapping of corp user search index. Use the following Elasticsearch command to add new field to an existing index.

```json
curl http://localhost:8080/corpuserinfodocument/doc/_mapping? --data '
{
  "properties": {
    "courses": {
      "type": "text
    }
  }
}'
```

## 4. Modify index config, so that the new mapping is picked up next time
If you want corp user search index to contain this new field `courses` next time docker containers are brought up, we need to add this field to [corpuser-index-config.json](../../docker/elasticsearch-setup/corpuser-index-config.json).

```
{
  "settings": {
    "index": {
      "analysis": {
       ...
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {

        ...

        "courses": {
          "type": "text"
        }
      }
    }
  }
}
```
Choose your analyzer wisely. For this example, we store the field `courses` as an array of string and hence use `text` data type. Default analyzer is `standard` and it provides grammar based tokenization.

## 5. Update the index builder logic
Index builder is where the logic to transform an aspect to search document model is defined. For this example, we will add the logic in [CorpUserInfoIndexBuilder](../../metadata-builders/src/main/java/com/linkedin/metadata/builders/search/CorpUserInfoIndexBuilder.java).

```java
package com.linkedin.metadata.builders.search;

@Slf4j
public class CorpUserInfoIndexBuilder extends BaseIndexBuilder<CorpUserInfoDocument> {

  public CorpUserInfoIndexBuilder() {
    super(Collections.singletonList(CorpUserSnapshot.class), CorpUserInfoDocument.class);
  }
  
  ...
  
  @Nonnull
  private CorpUserInfoDocument getDocumentToUpdateFromAspect(@Nonnull CorpuserUrn urn,
      @Nonnull CorpUserEditableInfo corpUserEditableInfo) {
    final String aboutMe = corpUserEditableInfo.getAboutMe() == null ? "" : corpUserEditableInfo.getAboutMe();
    return new CorpUserInfoDocument()
        .setUrn(urn)
        .setAboutMe(aboutMe)
        .setTeams(corpUserEditableInfo.getTeams())
        .setSkills(corpUserEditableInfo.getSkills())
        .setCourses(corpUserEditableInfo.getCourses());
  }
  
  ...
  
}

```

## 6: Update search query template, to start searching over the new field
For this example, we will modify [corpUserESSearchQueryTemplate.json](../../gms/impl/src/main/resources/corpUserESSearchQueryTemplate.json) to start searching over the field `courses`. Here is an example.

```json
{
  "function_score": {
    "query": {
      "query_string": {
        "query": "$INPUT",
        "fields": [
          "fullName^4",
          "ldap^2",
          "managerLdap",
          "skills",
          "courses"
          "teams",
          "title"
        ],
        "default_operator": "and",
        "analyzer": "standard"
      }
    },
    "functions": [
      {
        "filter": {
          "term": {
            "active": true
          }
        },
        "weight": 2
      }
    ],
    "score_mode": "multiply"
  }
}
```
As you can see in the above query template, corp user search is performed across multiple fields, to which the field `courses` has been added.

## 7: Test your changes
Make sure relevant docker containers are rebuilt before testing the changes.
If this is a new field that has been added to an existing snapshot, then you can test by ingesting data that contains this new field. Here is an example of ingesting to `/corpUsers` endpoint, with the new field `courses`.

```
curl 'http://localhost:8080/corpUsers?action=ingest' -X POST -H 'X-RestLi-Protocol-Version:2.0.0' --data '
{
  "snapshot": {
    "aspects": [
      {
        "com.linkedin.identity.CorpUserEditableInfo": {
          "courses": [
            "Docker for Data Scientists",
            "AI100: Introduction to Artificial Intelligence"
          ],
          "skills": [
            
          ],
          "pictureLink": "https://raw.githubusercontent.com/linkedin/datahub/master/datahub-web/packages/data-portal/public/assets/images/default_avatar.png",
          "teams": [
            
          ]
        }
      }
    ],
    "urn": "urn:li:corpuser:datahub"
  }
}'
```

Once the ingestion is done, you can test your changes by issuing search queries. Here is an example query with response.

```
curl "http://localhost:8080/corpUsers?q=search&input=ai200" -H 'X-RestLi-Protocol-Version: 2.0.0' -s | jq

Response:
{
  "metadata": {
    "urns": [
      "urn:li:corpuser:datahub"
    ],
    "searchResultMetadatas": [
      
    ]
  },
  "elements": [
    {
      "editableInfo": {
        "skills": [
          
        ],
        "courses": [
          "Docker for Data Scientists",
          "AI100: Introduction to Artificial Intelligence"
        ],
        "pictureLink": "https://raw.githubusercontent.com/linkedin/datahub/master/datahub-web/packages/data-portal/public/assets/images/default_avatar.png",
        "teams": [
          
        ]
      },
      "username": "datahub",
      "info": {
        "active": true,
        "fullName": "Data Hub",
        "title": "CEO",
        "displayName": "Data Hub",
        "email": "datahub@linkedin.com"
      }
    }
  ],
  "paging": {
    "count": 10,
    "start": 0,
    "total": 1,
    "links": [
      
    ]
  }
}
```