The **search bar** is an important mechanism for discovering data assets in DataHub. From the search bar, you can find Datasets, Columns, Dashboards, Charts, Data Pipelines, and more. Simply type in a term and press 'enter'.
By default, search terms will match against different aspects of a data assets. This includes asset names, descriptions, tags, terms, owners, and even specific attributes like the names of columns in a table.
### Search Operators
The default boolean logic used to interpret text in a query string is `AND`. For example, a query of `information about orders` is interpreted as `information AND about AND orders`.
The filters sidebar sits on the left hand side of search results, and lets users find assets by drilling down. You can quickly filter by Data Platform (e.g. Snowflake), Tags, Glossary Terms, Domain, Owners, and more with a single click.
Currently, Advanced Filters support filtering by Column Name, Container, Domain, Description (entity or column level), Tag (entity or column level), Glossary Term (entity or column level), Owner, Entity Type, Subtype, Environment and soft-deleted status.
To add a new filter, click the add filter menu, choose a filter type, and then fill in the values you want to filter by.
By default, all filters must be matched in order for a result to appear. For example, if you add a tag filter and a platform filter, all results will have the tag and the platform. You can set the results to match any filter instead. Click on `all filters` and select `any filter` from the drop-down menu.
After creating a filter, you can choose whether results should or should not match it. Change this by clicking the operation in the top right of the filter and selecting the negated operation.
Search results appear ranked by their relevance. In self-hosted DataHub ranking is based on how closely the query matched textual fields of an asset and its metadata. In DataHub Cloud, ranking is based on a combination of textual relevance, usage (queries / views), and change frequency.
With better metadata comes better results. Learn more about ingestion technical metadata in the [metadata ingestion](../../metadata-ingestion/README.md) guide.
### Advanced queries
The search bar supports advanced queries with pattern matching, logical expressions and filtering by specific field matches.
- This will return entities with **mask** in the name. Names tends to be connected by other symbols, hence the wildcard symbols before and after the word.
- Dataset Properties are indexed in ElasticSearch the manner of key=value. Hence if you know the precise key-value pair, you can search using `"key=value"`. However, if you only know the key, you can use wildcards to replace the value and that is what is being done here.
- In this example, the query will return any entity which has any value for the **unversioned** structured property with qualified name `io.acryl.private.retentionTime01`.
- Find an entity with a **versioned** structured property
- This query will return results for a **versioned** structured property with qualified name `io.acryl.privacy.retentionTime`, version `20240614080000`, type `number` and value `365`.
- Returns results for a **versioned** structured property with qualified name `io.acryl.privacy.retentionTime`, version `20240614080000` and type `number`.
-`/q editedFieldDescriptions: latitude OR fieldDescriptions: latitude` [Sample results](https://demo.datahub.com/search?page=1&query=%2Fq%20editedFieldDescriptions%3A%20latitude%20OR%20fieldDescriptions%3A%20latitude)
- Datasets has 2 attributes that contains field description. fieldDescription comes from the SchemaMetadata aspect, while editedFieldDescriptions comes from the EditableSchemaMetadata aspect. EditableSchemaMetadata holds information that comes from UI edits, while SchemaMetadata holds data from ingestion of the dataset.
- BrowsePath is stored as a complete string, for instance `/datasets/prod/hive/SampleKafkaDataset`, hence the need for wildcards on both ends of the term to return a result.
# Example query - search for datasets matching the example_query_text who have the Dimension tag applied to a schema field and are from the data platform looker
For queries that return more than 10k entities we recommend using the [scrollAcrossEntities](https://docs.datahub.com/docs/graphql/queries/#scrollacrossentities) GraphQL API:
The location of the configuration file can be on the Java classpath or the local filesystem. A default configuration
file is included with the GMS jar with the name `search_config.yml`.
### Search Configuration
The search configuration yaml contains a simple list of configuration profiles selected using the `queryRegex`. If a
single profile is desired, a catch-all regex of `.*` can be used.
The list of search configurations can be grouped into 4 general sections.
1.`queryRegex` - Responsible for selecting the search customization based on the [regex matching](https://www.w3schools.com/java/java_regex.asp) the search query string.
4.`functionScore` - The Elasticsearch `function score`[[5](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-function-score-query.html#score-functions)] section of the overall query.
Similar to the options provided in the previous section for search configuration, there are autocomplete specific options
which can be configured.
Note: The scoring functions defined in the previous section are inherited for autocomplete by default, unless
overrides are provided in the autocomplete section.
For the most part the configuration options are identical to the search customization options in the previous
section, however they are located under `autocompleteConfigurations` in the yaml configuration file.
1.`queryRegex` - Responsible for selecting the search customization based on the [regex matching](https://www.w3schools.com/java/java_regex.asp) the search query string.
2. The following boolean enables/disables the function score inheritance from the normal search configuration: [`inheritFunctionScore`]
This flag will automatically be set to `false` when the `functionScore` section is provided. If set to `false` with no
`functionScore` provided, the default Elasticsearch `_score` is used.
3. Built-in query booleans - There is 1 built-in query which can be enabled/disabled. These include
the `default autocomplete query` query,
enabled with the following booleans
respectively [`defaultQuery`]
4.`boolQuery` - The base Elasticsearch `boolean query`[[4](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-bool-query.html)].
If enabled in #2 above, those queries will
appear in the `should` section of the `boolean query`[[4](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-bool-query.html)].
5.`functionScore` - The Elasticsearch `function score`[[5](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-function-score-query.html#score-functions)] section of the overall query.
#### Examples
These examples assume a match-all `queryRegex` of `.*` so that it would impact any search query for simplicity. Also
note that the `queryRegex` is applied individually for `searchConfigurations` and `autocompleteConfigurations` and they
do not have to be identical.
##### Example 1: Exclude `deprecated` entities from autocomplete
The order of the search results is based on the weight what Datahub gives them based on our search algorithm. The current algorithm in OSS DataHub is based on a text-match score from Elasticsearch.
The sample queries here are non exhaustive. [The link here](https://demo.datahub.com/tag/urn:li:tag:Searchable) shows the current list of indexed fields for each entity inside Datahub. Click on the fields inside each entity and see which field has the tag `Searchable`.
However, it does not tell you the specific attribute name to use for specialized searches. One way to do so is to inspect the ElasticSearch indices, for example: