2021-11-11 12:44:29 +01:00
< a name = "base" > < / a >
# Module base
< a name = "base.BaseQueryClassifier" > < / a >
2021-11-15 09:50:09 +01:00
## BaseQueryClassifier
2021-11-11 12:44:29 +01:00
```python
class BaseQueryClassifier(BaseComponent)
```
Abstract class for Query Classifiers
< a name = "sklearn" > < / a >
# Module sklearn
< a name = "sklearn.SklearnQueryClassifier" > < / a >
2021-11-15 09:50:09 +01:00
## SklearnQueryClassifier
2021-11-11 12:44:29 +01:00
```python
class SklearnQueryClassifier(BaseQueryClassifier)
```
A node to classify an incoming query into one of two categories using a lightweight sklearn model. Depending on the result, the query flows to a different branch in your pipeline
and the further processing can be customized. You can define this by connecting the further pipeline to either `output_1` or `output_2` from this node.
**Example**:
```python
|{
|pipe = Pipeline()
|pipe.add_node(component=SklearnQueryClassifier(), name="QueryClassifier", inputs=["Query"])
|pipe.add_node(component=elastic_retriever, name="ElasticRetriever", inputs=["QueryClassifier.output_2"])
|pipe.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_1"])
|# Keyword queries will use the ElasticRetriever
|pipe.run("kubernetes aws")
|# Semantic queries (questions, statements, sentences ...) will leverage the DPR retriever
|pipe.run("How to manage kubernetes on aws")
```
Models:
Pass your own `Sklearn` binary classification model or use one of the following pretrained ones:
1) Keywords vs. Questions/Statements (Default)
query_classifier can be found [here ](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/model.pickle )
query_vectorizer can be found [here ](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/vectorizer.pickle )
output_1 => question/statement
output_2 => keyword query
[Readme ](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/readme.txt )
2) Questions vs. Statements
query_classifier can be found [here ](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_statements/model.pickle )
query_vectorizer can be found [here ](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_statements/vectorizer.pickle )
output_1 => question
output_2 => statement
[Readme ](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_statements/readme.txt )
See also the [tutorial ](https://haystack.deepset.ai/tutorials/pipelines ) on pipelines.
< a name = "sklearn.SklearnQueryClassifier.__init__" > < / a >
#### \_\_init\_\_
```python
| __init__ (model_name_or_path: Union[
| str, Any
| ] = "https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/model.pickle", vectorizer_name_or_path: Union[
| str, Any
| ] = "https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/vectorizer.pickle")
```
**Arguments**:
- `model_name_or_path` : Gradient boosting based binary classifier to classify between keyword vs statement/question
queries or statement vs question queries.
- `vectorizer_name_or_path` : A ngram based Tfidf vectorizer for extracting features from query.
< a name = "transformers" > < / a >
# Module transformers
< a name = "transformers.TransformersQueryClassifier" > < / a >
2021-11-15 09:50:09 +01:00
## TransformersQueryClassifier
2021-11-11 12:44:29 +01:00
```python
class TransformersQueryClassifier(BaseQueryClassifier)
```
A node to classify an incoming query into one of two categories using a (small) BERT transformer model.
Depending on the result, the query flows to a different branch in your pipeline and the further processing
can be customized. You can define this by connecting the further pipeline to either `output_1` or `output_2`
from this node.
**Example**:
```python
|{
|pipe = Pipeline()
|pipe.add_node(component=TransformersQueryClassifier(), name="QueryClassifier", inputs=["Query"])
|pipe.add_node(component=elastic_retriever, name="ElasticRetriever", inputs=["QueryClassifier.output_2"])
|pipe.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_1"])
|# Keyword queries will use the ElasticRetriever
|pipe.run("kubernetes aws")
|# Semantic queries (questions, statements, sentences ...) will leverage the DPR retriever
|pipe.run("How to manage kubernetes on aws")
```
Models:
Pass your own `Transformer` binary classification model from file/huggingface or use one of the following
pretrained ones hosted on Huggingface:
1) Keywords vs. Questions/Statements (Default)
model_name_or_path="shahrukhx01/bert-mini-finetune-question-detection"
output_1 => question/statement
output_2 => keyword query
[Readme ](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/readme.txt )
2) Questions vs. Statements
`model_name_or_path` ="shahrukhx01/question-vs-statement-classifier"
output_1 => question
output_2 => statement
[Readme ](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_statements/readme.txt )
See also the [tutorial ](https://haystack.deepset.ai/tutorials/pipelines ) on pipelines.
< a name = "transformers.TransformersQueryClassifier.__init__" > < / a >
#### \_\_init\_\_
```python
| __init__ (model_name_or_path: Union[Path, str] = "shahrukhx01/bert-mini-finetune-question-detection", use_gpu: bool = True)
```
**Arguments**:
- `model_name_or_path` : Transformer based fine tuned mini bert model for query classification
- `use_gpu` : Whether to use GPU (if available).