mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-12-24 13:38:53 +00:00
Fixed tabs for haystack-website issue (#427)
This commit is contained in:
parent
66a1893f79
commit
6b35e38e12
@ -20,35 +20,55 @@ There are different DocumentStores in Haystack to fit different use cases and te
|
||||
|
||||
Initialising a new Document Store is straight forward.
|
||||
|
||||
<div class="filter">
|
||||
<a href="#elasticsearch">Elasticsearch</a> <a href="#faiss">FAISS</a> <a href="#inmemory">In Memory</a> <a href="#sql">SQL</a>
|
||||
</div>
|
||||
<div class="filter-elasticsearch table-wrapper" markdown="block">
|
||||
<div class="tabs tabsdsinstall">
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-1-1" name="tab-group-1" checked>
|
||||
<label class="labelouter" for="tab-1-1">Elasticsearch</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
```python
|
||||
document_store = ElasticsearchDocumentStore()
|
||||
```
|
||||
|
||||
</div>
|
||||
<div class="filter-faiss table-wrapper" markdown="block">
|
||||
</div>
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-1-2" name="tab-group-1">
|
||||
<label class="labelouter" for="tab-1-2">FAISS</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
```python
|
||||
document_store = FAISSDocumentStore()
|
||||
```
|
||||
|
||||
</div>
|
||||
<div class="filter-sql table-wrapper" markdown="block">
|
||||
</div>
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-1-3" name="tab-group-1">
|
||||
<label class="labelouter" for="tab-1-3">In Memory</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
```python
|
||||
document_store = InMemoryDocumentStore()
|
||||
```
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-1-4" name="tab-group-1">
|
||||
<label class="labelouter" for="tab-1-4">SQL</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
```python
|
||||
document_store = SQLDocumentStore()
|
||||
```
|
||||
|
||||
</div>
|
||||
<div class="filter-inmemory table-wrapper" markdown="block">
|
||||
|
||||
```python
|
||||
document_store = InMemoryDocumentStore()
|
||||
```
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
@ -123,11 +143,12 @@ Having GPU acceleration will significantly speed this up.
|
||||
|
||||
The Document stores have different characteristics. You should choose one depending on the maturity of your project, the use case and technical environment:
|
||||
|
||||
<div class="tabs tabsdschoose">
|
||||
|
||||
<div class="filter">
|
||||
<a href="#elasticsearch">Elasticsearch</a> <a href="#faiss">FAISS</a> <a href="#inmemory">In Memory</a> <a href="#sql">SQL</a>
|
||||
</div>
|
||||
<div class="filter-elasticsearch table-wrapper" markdown="block">
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-2-1" name="tab-group-2" checked>
|
||||
<label class="labelouter" for="tab-2-1">Elasticsearch</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
**Pros:**
|
||||
- Fast & accurate sparse retrieval
|
||||
@ -139,7 +160,12 @@ The Document stores have different characteristics. You should choose one depend
|
||||
- Slow for dense retrieval with more than ~ 1 Mio documents
|
||||
|
||||
</div>
|
||||
<div class="filter-faiss table-wrapper" markdown="block">
|
||||
</div>
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-2-2" name="tab-group-2">
|
||||
<label class="labelouter" for="tab-2-2">FAISS</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
**Pros:**
|
||||
- Fast & accurate dense retrieval
|
||||
@ -150,7 +176,12 @@ The Document stores have different characteristics. You should choose one depend
|
||||
- No efficient sparse retrieval
|
||||
|
||||
</div>
|
||||
<div class="filter-sql table-wrapper" markdown="block">
|
||||
</div>
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-2-3" name="tab-group-2">
|
||||
<label class="labelouter" for="tab-2-3">In Memory</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
**Pros:**
|
||||
- Simple
|
||||
@ -162,7 +193,12 @@ The Document stores have different characteristics. You should choose one depend
|
||||
- Not recommended for production
|
||||
|
||||
</div>
|
||||
<div class="filter-inmemory table-wrapper" markdown="block">
|
||||
</div>
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-2-4" name="tab-group-2">
|
||||
<label class="labelouter" for="tab-2-4">SQL</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
**Pros:**
|
||||
- Simple & fast to test
|
||||
@ -172,6 +208,9 @@ The Document stores have different characteristics. You should choose one depend
|
||||
- Not scalable
|
||||
- Not persisting your data on disk
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
#### Our recommendations
|
||||
|
||||
@ -11,23 +11,28 @@ id: "get_startedmd"
|
||||
|
||||
## Installation
|
||||
|
||||
<div class="filter">
|
||||
<a href="#basic">Basic</a> <a href="#editable">Editable</a>
|
||||
</div>
|
||||
<div class="filter-basic table-wrapper" markdown="block">
|
||||
<div class="tabs tabsgetstarted">
|
||||
|
||||
The most straightforward way to install Haystack is through pip.
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-1" name="tab-group-1" checked>
|
||||
<label class="labelouter" for="tab-1">Basic</label>
|
||||
<div class="tabcontent">
|
||||
The most straightforward way to install Haystack is through pip.<br/><br/>
|
||||
|
||||
```python
|
||||
$ pip install farm-haystack
|
||||
```
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div class="filter-editable table-wrapper" markdown="block">
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-2" name="tab-group-1">
|
||||
<label class="labelouter" for="tab-2">Editable</label>
|
||||
<div class="tabcontent">
|
||||
If you’d like to run a specific, unreleased version of Haystack, or make edits to the way Haystack runs,
|
||||
you’ll want to install it using `git` and `pip --editable`.
|
||||
This clones a copy of the repo to a local directory and runs Haystack from there.
|
||||
This clones a copy of the repo to a local directory and runs Haystack from there. <br/><br/>
|
||||
|
||||
```python
|
||||
$ git clone https://github.com/deepset-ai/haystack.git
|
||||
@ -35,9 +40,9 @@ $ cd haystack
|
||||
$ pip install --editable .
|
||||
```
|
||||
|
||||
By default, this will give you the latest version of the master branch.
|
||||
Use regular git commands to switch between different branches and commits.
|
||||
|
||||
By default, this will give you the latest version of the master branch. Use regular git commands to switch between different branches and commits.
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
@ -25,10 +25,12 @@ Haystack’s Readers are:
|
||||
|
||||
* state-of-the-art in QA tasks like SQuAD and Natural Questions
|
||||
|
||||
<div class="filter">
|
||||
<a href="#farm">FARM</a> <a href="#transformers">Transformers</a>
|
||||
</div>
|
||||
<div class="filter-farm table-wrapper" markdown="block">
|
||||
<div class="tabs tabsreaderreader">
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-0-1" name="tab-group-0" checked>
|
||||
<label class="labelouter" for="tab-0-1">FARM</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
```python
|
||||
model = "deepset/roberta-base-squad2"
|
||||
@ -36,8 +38,13 @@ reader = FARMReader(model, use_gpu=True)
|
||||
finder = Finder(reader, retriever)
|
||||
```
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div class="filter-transformers table-wrapper" markdown="block">
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-0-2" name="tab-group-0">
|
||||
<label class="labelouter" for="tab-0-2">Transformers</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
```python
|
||||
model = "deepset/roberta-base-squad2"
|
||||
@ -45,6 +52,9 @@ reader = TransformersReader(model, use_gpu=1)
|
||||
finder = Finder(reader, retriever)
|
||||
```
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
While these models can work on CPU, it is recommended that they are run using GPUs to keep query times low.
|
||||
@ -58,12 +68,19 @@ and you have the option of using the QA pipeline from deepset FARM or HuggingFac
|
||||
Currently, there are a lot of different models out there and it can be rather overwhelming trying to pick the one that fits your use case.
|
||||
To get you started, we have a few recommendations for you to try out.
|
||||
|
||||
**FARM**
|
||||
<div class="tabs tabsreader">
|
||||
|
||||
<div class="filter">
|
||||
<a href="#roberta">RoBERTa (base)</a> <a href="#minilm">MiniLM</a> <a href="#albert">ALBERT (XXL)</a>
|
||||
</div>
|
||||
<div class="filter-roberta table-wrapper" markdown="block">
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-1" name="tab-group-1" checked>
|
||||
<label class="labelouter" for="tab-1">FARM</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
<div class="tabs innertabs">
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-1-1" name="tab-group-2" checked>
|
||||
<label class="labelinner" for="tab-1-1">RoBERTa (base)</label>
|
||||
<div class="tabcontentinner">
|
||||
|
||||
**An optimised variant of BERT and a great starting point.**
|
||||
|
||||
@ -71,14 +88,17 @@ To get you started, we have a few recommendations for you to try out.
|
||||
reader = FARMReader("deepset/roberta-base-squad2")
|
||||
```
|
||||
|
||||
|
||||
* **Pro**: Strong all round model
|
||||
|
||||
|
||||
* **Con**: There are other models that are either faster or more accurate
|
||||
|
||||
</div>
|
||||
<div class="filter-minilm table-wrapper" markdown="block">
|
||||
</div>
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-1-2" name="tab-group-2">
|
||||
<label class="labelinner" for="tab-1-2">MiniLM</label>
|
||||
<div class="tabcontentinner">
|
||||
|
||||
**A cleverly distilled model that sacrifices a little accuracy for speed.**
|
||||
|
||||
@ -86,14 +106,17 @@ reader = FARMReader("deepset/roberta-base-squad2")
|
||||
reader = FARMReader("deepset/minilm-uncased-squad2")
|
||||
```
|
||||
|
||||
|
||||
* **Pro**: Inference speed up to 50% faster than BERT base
|
||||
|
||||
|
||||
* **Con**: Still doesn’t match the best base sized models in accuracy
|
||||
|
||||
</div>
|
||||
<div class="filter-albert table-wrapper" markdown="block">
|
||||
</div>
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-1-3" name="tab-group-2">
|
||||
<label class="labelinner" for="tab-1-3">ALBERT (XXL)</label>
|
||||
<div class="tabcontentinner">
|
||||
|
||||
**Large, powerful, SotA model.**
|
||||
|
||||
@ -101,21 +124,29 @@ reader = FARMReader("deepset/minilm-uncased-squad2")
|
||||
reader = FARMReader("ahotrod/albert_xxlargev1_squad2_512")
|
||||
```
|
||||
|
||||
|
||||
* **Pro**: Better accuracy than any other open source model in QA
|
||||
|
||||
|
||||
* **Con**: The computational power needed make it impractical for most use cases
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
**Transformers**
|
||||
|
||||
<div class="filter">
|
||||
<a href="#roberta_">RoBERTa (base)</a> <a href="#minilm_">MiniLM</a> <a href="#albert_">ALBERT (XXL)</a>
|
||||
</div>
|
||||
<div class="filter-roberta_ table-wrapper" markdown="block">
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-2" name="tab-group-1">
|
||||
<label class="labelouter" for="tab-2">Transformers</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
<div class="tabs innertabs">
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-2-1" name="tab-group-3" checked>
|
||||
<label class="labelinner" for="tab-2-1">RoBERTa (base)</label>
|
||||
<div class="tabcontentinner">
|
||||
|
||||
**An optimised variant of BERT and a great starting point.**
|
||||
|
||||
@ -123,14 +154,17 @@ reader = FARMReader("ahotrod/albert_xxlargev1_squad2_512")
|
||||
reader = TransformersReader("deepset/roberta-base-squad2")
|
||||
```
|
||||
|
||||
|
||||
* **Pro**: Strong all round model
|
||||
|
||||
|
||||
* **Con**: There are other models that are either faster or more accurate
|
||||
|
||||
</div>
|
||||
<div class="filter-minilm_ table-wrapper" markdown="block">
|
||||
</div>
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-2-2" name="tab-group-3">
|
||||
<label class="labelinner" for="tab-2-2">MiniLM</label>
|
||||
<div class="tabcontentinner">
|
||||
|
||||
**A cleverly distilled model that sacrifices a little accuracy for speed.**
|
||||
|
||||
@ -138,14 +172,17 @@ reader = TransformersReader("deepset/roberta-base-squad2")
|
||||
reader = TransformersReader("deepset/minilm-uncased-squad2")
|
||||
```
|
||||
|
||||
|
||||
* **Pro**: Inference speed up to 50% faster than BERT base
|
||||
|
||||
|
||||
* **Con**: Still doesn’t match the best base sized models in accuracy
|
||||
|
||||
</div>
|
||||
<div class="filter-albert_ table-wrapper" markdown="block">
|
||||
</div>
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-2-3" name="tab-group-3">
|
||||
<label class="labelinner" for="tab-2-3">ALBERT (XXL)</label>
|
||||
<div class="tabcontentinner">
|
||||
|
||||
**Large, powerful, SotA model.**
|
||||
|
||||
@ -153,12 +190,18 @@ reader = TransformersReader("deepset/minilm-uncased-squad2")
|
||||
reader = TransformersReader("ahotrod/albert_xxlargev1_squad2_512")
|
||||
```
|
||||
|
||||
|
||||
* **Pro**: Better accuracy than any other open source model in QA
|
||||
|
||||
|
||||
* **Con**: The computational power needed make it impractical for most use cases
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
**All-rounder**: In the class of base sized models trained on SQuAD, **RoBERTa** has shown better performance than BERT
|
||||
@ -183,59 +226,104 @@ While models are comparatively more performant on English,
|
||||
thanks to a wealth of available English training data,
|
||||
there are a couple QA models that are directly usable in Haystack.
|
||||
|
||||
**FARM**
|
||||
<div class="tabs tabsreaderlanguage">
|
||||
|
||||
<div class="filter">
|
||||
<a href="#french">French</a> <a href="#italian">Italian</a> <a href="#zeroshot">Zero-shot</a>
|
||||
</div>
|
||||
<div class="filter-french table-wrapper" markdown="block">
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-4-1" name="tab-group-4" checked>
|
||||
<label class="labelouter" for="tab-4-1">FARM</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
<div class="tabs innertabslanguage">
|
||||
|
||||
<div class="tabinner">
|
||||
<input type="radio" id="tab-5-1" name="tab-group-5" checked>
|
||||
<label class="labelinner" for="tab-5-1">French</label>
|
||||
<div class="tabcontentinner">
|
||||
|
||||
```python
|
||||
reader = FARMReader("illuin/camembert-base-fquad")
|
||||
```
|
||||
|
||||
</div>
|
||||
<div class="filter-italian table-wrapper" markdown="block">
|
||||
</div>
|
||||
|
||||
<div class="tabinner">
|
||||
<input type="radio" id="tab-5-2" name="tab-group-5">
|
||||
<label class="labelinner" for="tab-5-2">Italian</label>
|
||||
<div class="tabcontentinner">
|
||||
|
||||
```python
|
||||
reader = FARMReader("mrm8488/bert-italian-finedtuned-squadv1-it-alfa")
|
||||
```
|
||||
|
||||
</div>
|
||||
<div class="filter-zeroshot table-wrapper" markdown="block">
|
||||
</div>
|
||||
|
||||
<div class="tabinner">
|
||||
<input type="radio" id="tab-5-3" name="tab-group-5">
|
||||
<label class="labelinner" for="tab-5-3">Zero-shot</label>
|
||||
<div class="tabcontentinner">
|
||||
|
||||
```python
|
||||
reader = FARMReader("deepset/xlm-roberta-large-squad2")
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
**Transformers**
|
||||
|
||||
<div class="filter">
|
||||
<a href="#french_">French</a> <a href="#italian_">Italian</a> <a href="#zeroshot_">Zero-shot</a>
|
||||
</div>
|
||||
<div class="filter-french_ table-wrapper" markdown="block">
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-4-2" name="tab-group-4">
|
||||
<label class="labelouter" for="tab-4-2">Transformers</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
<div class="tabs innertabslanguage">
|
||||
|
||||
<div class="tabinner2">
|
||||
<input type="radio" id="tab-6-1" name="tab-group-6" checked>
|
||||
<label class="labelinner" for="tab-6-1">French</label>
|
||||
<div class="tabcontentinner">
|
||||
|
||||
```python
|
||||
reader = TransformersReader("illuin/camembert-base-fquad")
|
||||
```
|
||||
|
||||
</div>
|
||||
<div class="filter-italian_ table-wrapper" markdown="block">
|
||||
</div>
|
||||
|
||||
<div class="tabinner2">
|
||||
<input type="radio" id="tab-6-2" name="tab-group-6">
|
||||
<label class="labelinner" for="tab-6-2">Italian</label>
|
||||
<div class="tabcontentinner">
|
||||
|
||||
```python
|
||||
reader = TransformersReader("mrm8488/bert-italian-finedtuned-squadv1-it-alfa")
|
||||
```
|
||||
|
||||
</div>
|
||||
<div class="filter-zeroshot_ table-wrapper" markdown="block">
|
||||
</div>
|
||||
|
||||
<div class="tabinner2">
|
||||
<input type="radio" id="tab-6-3" name="tab-group-6">
|
||||
<label class="labelinner" for="tab-6-3">Zero-shot</label>
|
||||
<div class="tabcontentinner">
|
||||
|
||||
```python
|
||||
reader = TransformersReader("deepset/xlm-roberta-large-squad2")
|
||||
```
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
The **French** and **Italian models** are both monolingual language models trained on French and Italian versions of the SQuAD dataset
|
||||
@ -317,22 +405,32 @@ This functions by slicing the document into overlapping passages of (approximate
|
||||
that are each offset by `doc_stride` number of tokens.
|
||||
These can be set when the Reader is initialized.
|
||||
|
||||
<div class="filter">
|
||||
<a href="#farm">FARM</a> <a href="#transformers">Transformers</a>
|
||||
</div>
|
||||
<div class="filter-farm table-wrapper" markdown="block">
|
||||
<div class="tabs tabsreaderdeep">
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-7-1" name="tab-group-7" checked>
|
||||
<label class="labelouter" for="tab-7-1">FARM</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
```python
|
||||
reader = FARMReader(... max_seq_len=384, doc_stride=128 ...)
|
||||
```
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div class="filter-transformers table-wrapper" markdown="block">
|
||||
|
||||
<div class="tab">
|
||||
<input type="radio" id="tab-7-2" name="tab-group-7">
|
||||
<label class="labelouter" for="tab-7-2">Transformers</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
```python
|
||||
reader = TransformersReader(... max_seq_len=384, doc_stride=128 ...
|
||||
```
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
Predictions are made on each individual passage and the process of aggregation picks the best candidates across all passages.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user