Fixed tabs for haystack-website issue (#427)

This commit is contained in:
Markus Paff 2020-09-24 10:36:18 +02:00 committed by GitHub
parent 66a1893f79
commit 6b35e38e12
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 224 additions and 82 deletions

View File

@ -20,35 +20,55 @@ There are different DocumentStores in Haystack to fit different use cases and te
Initialising a new Document Store is straight forward.
<div class="filter">
<a href="#elasticsearch">Elasticsearch</a> <a href="#faiss">FAISS</a> <a href="#inmemory">In Memory</a> <a href="#sql">SQL</a>
</div>
<div class="filter-elasticsearch table-wrapper" markdown="block">
<div class="tabs tabsdsinstall">
<div class="tab">
<input type="radio" id="tab-1-1" name="tab-group-1" checked>
<label class="labelouter" for="tab-1-1">Elasticsearch</label>
<div class="tabcontent">
```python
document_store = ElasticsearchDocumentStore()
```
</div>
<div class="filter-faiss table-wrapper" markdown="block">
</div>
<div class="tab">
<input type="radio" id="tab-1-2" name="tab-group-1">
<label class="labelouter" for="tab-1-2">FAISS</label>
<div class="tabcontent">
```python
document_store = FAISSDocumentStore()
```
</div>
<div class="filter-sql table-wrapper" markdown="block">
</div>
<div class="tab">
<input type="radio" id="tab-1-3" name="tab-group-1">
<label class="labelouter" for="tab-1-3">In Memory</label>
<div class="tabcontent">
```python
document_store = InMemoryDocumentStore()
```
</div>
</div>
<div class="tab">
<input type="radio" id="tab-1-4" name="tab-group-1">
<label class="labelouter" for="tab-1-4">SQL</label>
<div class="tabcontent">
```python
document_store = SQLDocumentStore()
```
</div>
<div class="filter-inmemory table-wrapper" markdown="block">
```python
document_store = InMemoryDocumentStore()
```
</div>
</div>
@ -123,11 +143,12 @@ Having GPU acceleration will significantly speed this up.
The Document stores have different characteristics. You should choose one depending on the maturity of your project, the use case and technical environment:
<div class="tabs tabsdschoose">
<div class="filter">
<a href="#elasticsearch">Elasticsearch</a> <a href="#faiss">FAISS</a> <a href="#inmemory">In Memory</a> <a href="#sql">SQL</a>
</div>
<div class="filter-elasticsearch table-wrapper" markdown="block">
<div class="tab">
<input type="radio" id="tab-2-1" name="tab-group-2" checked>
<label class="labelouter" for="tab-2-1">Elasticsearch</label>
<div class="tabcontent">
**Pros:**
- Fast & accurate sparse retrieval
@ -139,7 +160,12 @@ The Document stores have different characteristics. You should choose one depend
- Slow for dense retrieval with more than ~ 1 Mio documents
</div>
<div class="filter-faiss table-wrapper" markdown="block">
</div>
<div class="tab">
<input type="radio" id="tab-2-2" name="tab-group-2">
<label class="labelouter" for="tab-2-2">FAISS</label>
<div class="tabcontent">
**Pros:**
- Fast & accurate dense retrieval
@ -150,7 +176,12 @@ The Document stores have different characteristics. You should choose one depend
- No efficient sparse retrieval
</div>
<div class="filter-sql table-wrapper" markdown="block">
</div>
<div class="tab">
<input type="radio" id="tab-2-3" name="tab-group-2">
<label class="labelouter" for="tab-2-3">In Memory</label>
<div class="tabcontent">
**Pros:**
- Simple
@ -162,7 +193,12 @@ The Document stores have different characteristics. You should choose one depend
- Not recommended for production
</div>
<div class="filter-inmemory table-wrapper" markdown="block">
</div>
<div class="tab">
<input type="radio" id="tab-2-4" name="tab-group-2">
<label class="labelouter" for="tab-2-4">SQL</label>
<div class="tabcontent">
**Pros:**
- Simple & fast to test
@ -172,6 +208,9 @@ The Document stores have different characteristics. You should choose one depend
- Not scalable
- Not persisting your data on disk
</div>
</div>
</div>
#### Our recommendations

View File

@ -11,23 +11,28 @@ id: "get_startedmd"
## Installation
<div class="filter">
<a href="#basic">Basic</a> <a href="#editable">Editable</a>
</div>
<div class="filter-basic table-wrapper" markdown="block">
<div class="tabs tabsgetstarted">
The most straightforward way to install Haystack is through pip.
<div class="tab">
<input type="radio" id="tab-1" name="tab-group-1" checked>
<label class="labelouter" for="tab-1">Basic</label>
<div class="tabcontent">
The most straightforward way to install Haystack is through pip.<br/><br/>
```python
$ pip install farm-haystack
```
</div>
</div>
<div class="filter-editable table-wrapper" markdown="block">
<div class="tab">
<input type="radio" id="tab-2" name="tab-group-1">
<label class="labelouter" for="tab-2">Editable</label>
<div class="tabcontent">
If youd like to run a specific, unreleased version of Haystack, or make edits to the way Haystack runs,
youll want to install it using `git` and `pip --editable`.
This clones a copy of the repo to a local directory and runs Haystack from there.
This clones a copy of the repo to a local directory and runs Haystack from there. <br/><br/>
```python
$ git clone https://github.com/deepset-ai/haystack.git
@ -35,9 +40,9 @@ $ cd haystack
$ pip install --editable .
```
By default, this will give you the latest version of the master branch.
Use regular git commands to switch between different branches and commits.
By default, this will give you the latest version of the master branch. Use regular git commands to switch between different branches and commits.
</div>
</div>
</div>

View File

@ -25,10 +25,12 @@ Haystacks Readers are:
* state-of-the-art in QA tasks like SQuAD and Natural Questions
<div class="filter">
<a href="#farm">FARM</a> <a href="#transformers">Transformers</a>
</div>
<div class="filter-farm table-wrapper" markdown="block">
<div class="tabs tabsreaderreader">
<div class="tab">
<input type="radio" id="tab-0-1" name="tab-group-0" checked>
<label class="labelouter" for="tab-0-1">FARM</label>
<div class="tabcontent">
```python
model = "deepset/roberta-base-squad2"
@ -36,8 +38,13 @@ reader = FARMReader(model, use_gpu=True)
finder = Finder(reader, retriever)
```
</div>
</div>
<div class="filter-transformers table-wrapper" markdown="block">
<div class="tab">
<input type="radio" id="tab-0-2" name="tab-group-0">
<label class="labelouter" for="tab-0-2">Transformers</label>
<div class="tabcontent">
```python
model = "deepset/roberta-base-squad2"
@ -45,6 +52,9 @@ reader = TransformersReader(model, use_gpu=1)
finder = Finder(reader, retriever)
```
</div>
</div>
</div>
While these models can work on CPU, it is recommended that they are run using GPUs to keep query times low.
@ -58,12 +68,19 @@ and you have the option of using the QA pipeline from deepset FARM or HuggingFac
Currently, there are a lot of different models out there and it can be rather overwhelming trying to pick the one that fits your use case.
To get you started, we have a few recommendations for you to try out.
**FARM**
<div class="tabs tabsreader">
<div class="filter">
<a href="#roberta">RoBERTa (base)</a> <a href="#minilm">MiniLM</a> <a href="#albert">ALBERT (XXL)</a>
</div>
<div class="filter-roberta table-wrapper" markdown="block">
<div class="tab">
<input type="radio" id="tab-1" name="tab-group-1" checked>
<label class="labelouter" for="tab-1">FARM</label>
<div class="tabcontent">
<div class="tabs innertabs">
<div class="tab">
<input type="radio" id="tab-1-1" name="tab-group-2" checked>
<label class="labelinner" for="tab-1-1">RoBERTa (base)</label>
<div class="tabcontentinner">
**An optimised variant of BERT and a great starting point.**
@ -71,14 +88,17 @@ To get you started, we have a few recommendations for you to try out.
reader = FARMReader("deepset/roberta-base-squad2")
```
* **Pro**: Strong all round model
* **Con**: There are other models that are either faster or more accurate
</div>
<div class="filter-minilm table-wrapper" markdown="block">
</div>
<div class="tab">
<input type="radio" id="tab-1-2" name="tab-group-2">
<label class="labelinner" for="tab-1-2">MiniLM</label>
<div class="tabcontentinner">
**A cleverly distilled model that sacrifices a little accuracy for speed.**
@ -86,14 +106,17 @@ reader = FARMReader("deepset/roberta-base-squad2")
reader = FARMReader("deepset/minilm-uncased-squad2")
```
* **Pro**: Inference speed up to 50% faster than BERT base
* **Con**: Still doesnt match the best base sized models in accuracy
</div>
<div class="filter-albert table-wrapper" markdown="block">
</div>
<div class="tab">
<input type="radio" id="tab-1-3" name="tab-group-2">
<label class="labelinner" for="tab-1-3">ALBERT (XXL)</label>
<div class="tabcontentinner">
**Large, powerful, SotA model.**
@ -101,21 +124,29 @@ reader = FARMReader("deepset/minilm-uncased-squad2")
reader = FARMReader("ahotrod/albert_xxlargev1_squad2_512")
```
* **Pro**: Better accuracy than any other open source model in QA
* **Con**: The computational power needed make it impractical for most use cases
</div>
**Transformers**
<div class="filter">
<a href="#roberta_">RoBERTa (base)</a> <a href="#minilm_">MiniLM</a> <a href="#albert_">ALBERT (XXL)</a>
</div>
<div class="filter-roberta_ table-wrapper" markdown="block">
</div>
</div>
</div>
<div class="tab">
<input type="radio" id="tab-2" name="tab-group-1">
<label class="labelouter" for="tab-2">Transformers</label>
<div class="tabcontent">
<div class="tabs innertabs">
<div class="tab">
<input type="radio" id="tab-2-1" name="tab-group-3" checked>
<label class="labelinner" for="tab-2-1">RoBERTa (base)</label>
<div class="tabcontentinner">
**An optimised variant of BERT and a great starting point.**
@ -123,14 +154,17 @@ reader = FARMReader("ahotrod/albert_xxlargev1_squad2_512")
reader = TransformersReader("deepset/roberta-base-squad2")
```
* **Pro**: Strong all round model
* **Con**: There are other models that are either faster or more accurate
</div>
<div class="filter-minilm_ table-wrapper" markdown="block">
</div>
<div class="tab">
<input type="radio" id="tab-2-2" name="tab-group-3">
<label class="labelinner" for="tab-2-2">MiniLM</label>
<div class="tabcontentinner">
**A cleverly distilled model that sacrifices a little accuracy for speed.**
@ -138,14 +172,17 @@ reader = TransformersReader("deepset/roberta-base-squad2")
reader = TransformersReader("deepset/minilm-uncased-squad2")
```
* **Pro**: Inference speed up to 50% faster than BERT base
* **Con**: Still doesnt match the best base sized models in accuracy
</div>
<div class="filter-albert_ table-wrapper" markdown="block">
</div>
<div class="tab">
<input type="radio" id="tab-2-3" name="tab-group-3">
<label class="labelinner" for="tab-2-3">ALBERT (XXL)</label>
<div class="tabcontentinner">
**Large, powerful, SotA model.**
@ -153,12 +190,18 @@ reader = TransformersReader("deepset/minilm-uncased-squad2")
reader = TransformersReader("ahotrod/albert_xxlargev1_squad2_512")
```
* **Pro**: Better accuracy than any other open source model in QA
* **Con**: The computational power needed make it impractical for most use cases
</div>
</div>
</div>
</div>
</div>
</div>
**All-rounder**: In the class of base sized models trained on SQuAD, **RoBERTa** has shown better performance than BERT
@ -183,59 +226,104 @@ While models are comparatively more performant on English,
thanks to a wealth of available English training data,
there are a couple QA models that are directly usable in Haystack.
**FARM**
<div class="tabs tabsreaderlanguage">
<div class="filter">
<a href="#french">French</a> <a href="#italian">Italian</a> <a href="#zeroshot">Zero-shot</a>
</div>
<div class="filter-french table-wrapper" markdown="block">
<div class="tab">
<input type="radio" id="tab-4-1" name="tab-group-4" checked>
<label class="labelouter" for="tab-4-1">FARM</label>
<div class="tabcontent">
<div class="tabs innertabslanguage">
<div class="tabinner">
<input type="radio" id="tab-5-1" name="tab-group-5" checked>
<label class="labelinner" for="tab-5-1">French</label>
<div class="tabcontentinner">
```python
reader = FARMReader("illuin/camembert-base-fquad")
```
</div>
<div class="filter-italian table-wrapper" markdown="block">
</div>
<div class="tabinner">
<input type="radio" id="tab-5-2" name="tab-group-5">
<label class="labelinner" for="tab-5-2">Italian</label>
<div class="tabcontentinner">
```python
reader = FARMReader("mrm8488/bert-italian-finedtuned-squadv1-it-alfa")
```
</div>
<div class="filter-zeroshot table-wrapper" markdown="block">
</div>
<div class="tabinner">
<input type="radio" id="tab-5-3" name="tab-group-5">
<label class="labelinner" for="tab-5-3">Zero-shot</label>
<div class="tabcontentinner">
```python
reader = FARMReader("deepset/xlm-roberta-large-squad2")
```
</div>
**Transformers**
<div class="filter">
<a href="#french_">French</a> <a href="#italian_">Italian</a> <a href="#zeroshot_">Zero-shot</a>
</div>
<div class="filter-french_ table-wrapper" markdown="block">
</div>
</div>
</div>
<div class="tab">
<input type="radio" id="tab-4-2" name="tab-group-4">
<label class="labelouter" for="tab-4-2">Transformers</label>
<div class="tabcontent">
<div class="tabs innertabslanguage">
<div class="tabinner2">
<input type="radio" id="tab-6-1" name="tab-group-6" checked>
<label class="labelinner" for="tab-6-1">French</label>
<div class="tabcontentinner">
```python
reader = TransformersReader("illuin/camembert-base-fquad")
```
</div>
<div class="filter-italian_ table-wrapper" markdown="block">
</div>
<div class="tabinner2">
<input type="radio" id="tab-6-2" name="tab-group-6">
<label class="labelinner" for="tab-6-2">Italian</label>
<div class="tabcontentinner">
```python
reader = TransformersReader("mrm8488/bert-italian-finedtuned-squadv1-it-alfa")
```
</div>
<div class="filter-zeroshot_ table-wrapper" markdown="block">
</div>
<div class="tabinner2">
<input type="radio" id="tab-6-3" name="tab-group-6">
<label class="labelinner" for="tab-6-3">Zero-shot</label>
<div class="tabcontentinner">
```python
reader = TransformersReader("deepset/xlm-roberta-large-squad2")
```
</div>
</div>
</div>
</div>
</div>
</div>
The **French** and **Italian models** are both monolingual language models trained on French and Italian versions of the SQuAD dataset
@ -317,22 +405,32 @@ This functions by slicing the document into overlapping passages of (approximate
that are each offset by `doc_stride` number of tokens.
These can be set when the Reader is initialized.
<div class="filter">
<a href="#farm">FARM</a> <a href="#transformers">Transformers</a>
</div>
<div class="filter-farm table-wrapper" markdown="block">
<div class="tabs tabsreaderdeep">
<div class="tab">
<input type="radio" id="tab-7-1" name="tab-group-7" checked>
<label class="labelouter" for="tab-7-1">FARM</label>
<div class="tabcontent">
```python
reader = FARMReader(... max_seq_len=384, doc_stride=128 ...)
```
</div>
</div>
<div class="filter-transformers table-wrapper" markdown="block">
<div class="tab">
<input type="radio" id="tab-7-2" name="tab-group-7">
<label class="labelouter" for="tab-7-2">Transformers</label>
<div class="tabcontent">
```python
reader = TransformersReader(... max_seq_len=384, doc_stride=128 ...
```
</div>
</div>
</div>
Predictions are made on each individual passage and the process of aggregation picks the best candidates across all passages.