[](https://colab.research.google.com/github/deepset-ai/haystack/blob/main/tutorials/Tutorial6_Better_Retrieval_via_DPR.ipynb)
The Retriever has a huge impact on the performance of our overall search pipeline.
### Different types of Retrievers
#### Sparse
Family of algorithms based on counting the occurrences of words (bag-of-words) resulting in very sparse vectors with length = vocab size.
**Examples**: BM25, TF-IDF
**Pros**: Simple, fast, well explainable
**Cons**: Relies on exact keyword matches between query and text
#### Dense
These retrievers use neural network models to create "dense" embedding vectors. Within this family there are two different approaches:
a) Single encoder: Use a **single model** to embed both query and passage.
b) Dual-encoder: Use **two models**, one to embed the query and one to embed the passage
Recent work suggests that dual encoders work better, likely because they can deal better with the different nature of query and passage (length, style, syntax ...).
**Examples**: REALM, DPR, Sentence-Transformers
**Pros**: Captures semantinc similarity instead of "word matches" (e.g. synonyms, related topics ...)
**Cons**: Computationally more heavy, initial training of model
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
_"Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks."_
*Use this* [link](https://colab.research.google.com/github/deepset-ai/haystack/blob/main/tutorials/Tutorial6_Better_Retrieval_via_DPR.ipynb) *to open the notebook in Google Colab.*
With a Haystack `Pipeline` you can stick together your building blocks to a search pipeline.
Under the hood, `Pipelines` are Directed Acyclic Graphs (DAGs) that you can easily customize for your own use cases.
To speed things up, Haystack also comes with a few predefined Pipelines. One of them is the `ExtractiveQAPipeline` that combines a retriever and a reader to answer our questions.
You can learn more about `Pipelines` in the [docs](https://haystack.deepset.ai/docs/latest/pipelinesmd).
```python
from haystack.pipeline import ExtractiveQAPipeline
pipe = ExtractiveQAPipeline(reader, retriever)
```
## Voilà! Ask a question!
```python
# You can configure how many candidates the reader and retriever shall return
# The higher top_k_retriever, the better (but also the slower) your answers.
prediction = pipe.run(query="Who created the Dothraki vocabulary?", top_k_retriever=10, top_k_reader=5)
```
```python
print_answers(prediction, details="minimal")
```
## About us
This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany
We bring NLP to the industry via open source!
Our focus: Industry specific language models & large scale QA systems.
Some of our other work:
- [German BERT](https://deepset.ai/german-bert)
- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)