mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-10-18 19:38:57 +00:00
Improve Docs Readability (#2617)
Signed-off-by: Ryan Russell <git@ryanrussell.org>
This commit is contained in:
parent
3c6fcc3e42
commit
c1b7948e10
@ -425,7 +425,7 @@ E.g. you can call execute_eval_run() multiple times with different retrievers in
|
||||
|
||||
- `index_pipeline`: The indexing pipeline to use.
|
||||
- `query_pipeline`: The query pipeline to evaluate.
|
||||
- `evaluation_set_labels`: The labels to evaluate on forming an evalution set.
|
||||
- `evaluation_set_labels`: The labels to evaluate on forming an evaluation set.
|
||||
- `corpus_file_paths`: The files to be indexed and searched during evaluation forming a corpus.
|
||||
- `experiment_name`: The name of the experiment
|
||||
- `experiment_run_name`: The name of the experiment run
|
||||
|
@ -139,7 +139,7 @@ print(docs[:3])
|
||||
document_store.write_documents(docs)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -76,7 +76,7 @@ docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split
|
||||
document_store.write_documents(docs)
|
||||
```
|
||||
|
||||
### Initalize Retriever and Reader/Generator
|
||||
### Initialize Retriever and Reader/Generator
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -130,7 +130,7 @@ print(tables[0].content)
|
||||
print(tables[0].meta)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -98,7 +98,7 @@ print(docs[:3])
|
||||
document_store.write_documents(docs)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -180,7 +180,7 @@ Here we evaluate retriever and reader in open domain fashion on the full corpus
|
||||
correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
|
||||
predicted answer string, regardless of which document this came from and the position of the extracted span.
|
||||
|
||||
The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
|
||||
The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
|
||||
### "Dense Passage Retrieval"
|
||||
|
||||
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
|
||||
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
|
||||
Original Abstract:
|
||||
|
||||
@ -147,7 +147,7 @@ docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split
|
||||
document_store.write_documents(docs)
|
||||
```
|
||||
|
||||
### Initalize Retriever, Reader & Pipeline
|
||||
### Initialize Retriever, Reader & Pipeline
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -66,7 +66,7 @@ fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
|
||||
Haystack's converter classes are designed to help you turn files on your computer into the documents
|
||||
that can be processed by the Haystack pipeline.
|
||||
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
|
||||
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
|
||||
|
||||
```python
|
||||
|
@ -142,7 +142,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -73,7 +73,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever and Reader/Generator
|
||||
### Initialize Retriever and Reader/Generator
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -101,7 +101,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
|
||||
### "Dense Passage Retrieval"
|
||||
|
||||
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
|
||||
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
|
||||
Original Abstract:
|
||||
|
||||
@ -145,7 +145,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever, Reader & Pipeline
|
||||
### Initialize Retriever, Reader & Pipeline
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -127,7 +127,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Finder
|
||||
## Initialize Retriever, Reader & Finder
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -87,7 +87,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Finder
|
||||
## Initialize Retriever, Reader & Finder
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
|
||||
### "Dense Passage Retrieval"
|
||||
|
||||
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
|
||||
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
|
||||
Original Abstract:
|
||||
|
||||
@ -124,7 +124,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever, Reader, & Finder
|
||||
### Initialize Retriever, Reader & Finder
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -127,7 +127,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Finder
|
||||
## Initialize Retriever, Reader & Finder
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -87,7 +87,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Finder
|
||||
## Initialize Retriever, Reader & Finder
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
|
||||
### "Dense Passage Retrieval"
|
||||
|
||||
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
|
||||
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
|
||||
Original Abstract:
|
||||
|
||||
@ -124,7 +124,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever, Reader, & Finder
|
||||
### Initialize Retriever, Reader & Finder
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -129,7 +129,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Finder
|
||||
## Initialize Retriever, Reader & Finder
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -89,7 +89,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Finder
|
||||
## Initialize Retriever, Reader & Finder
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
|
||||
### "Dense Passage Retrieval"
|
||||
|
||||
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
|
||||
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
|
||||
Original Abstract:
|
||||
|
||||
@ -129,7 +129,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever, Reader, & Finder
|
||||
### Initialize Retriever, Reader & Finder
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -142,7 +142,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Finder
|
||||
## Initialize Retriever, Reader & Finder
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -103,7 +103,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Finder
|
||||
## Initialize Retriever, Reader & Finder
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
|
||||
### "Dense Passage Retrieval"
|
||||
|
||||
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
|
||||
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
|
||||
Original Abstract:
|
||||
|
||||
@ -129,7 +129,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever, Reader, & Finder
|
||||
### Initialize Retriever, Reader & Finder
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -140,7 +140,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Finder
|
||||
## Initialize Retriever, Reader & Finder
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -101,7 +101,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Finder
|
||||
## Initialize Retriever, Reader & Finder
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
|
||||
### "Dense Passage Retrieval"
|
||||
|
||||
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
|
||||
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
|
||||
Original Abstract:
|
||||
|
||||
@ -127,7 +127,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever, Reader, & Finder
|
||||
### Initialize Retriever, Reader & Finder
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -142,7 +142,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Finder
|
||||
## Initialize Retriever, Reader & Finder
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -73,7 +73,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever and Reader/Generator
|
||||
### Initialize Retriever and Reader/Generator
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -102,7 +102,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Finder
|
||||
## Initialize Retriever, Reader & Finder
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
|
||||
### "Dense Passage Retrieval"
|
||||
|
||||
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
|
||||
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
|
||||
Original Abstract:
|
||||
|
||||
@ -146,7 +146,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever, Reader, & Finder
|
||||
### Initialize Retriever, Reader & Finder
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -280,10 +280,10 @@ The DataFrames have the following schema:
|
||||
- context (answers only): the surrounding context of the answer within the document
|
||||
- offsets_in_document (answers only): the position or offsets within the document the answer was found
|
||||
- gold_answers (answers only): the answers to be given
|
||||
- gold_offsets_in_documents (answers only): the positon or offsets of the gold answer within the document
|
||||
- gold_offsets_in_documents (answers only): the position or offsets of the gold answer within the document
|
||||
- exact_match (answers only): metric depicting if the answer exactly matches the gold label
|
||||
- f1 (answers only): metric depicting how well the answer overlaps with the gold label on token basis
|
||||
- sas (answers only, optional): metric depciting how well the answer matches the gold label on a semantic basis
|
||||
- sas (answers only, optional): metric depicting how well the answer matches the gold label on a semantic basis
|
||||
|
||||
**Arguments**:
|
||||
|
||||
|
@ -141,7 +141,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -75,7 +75,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever and Reader/Generator
|
||||
### Initialize Retriever and Reader/Generator
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -136,7 +136,7 @@ print(tables[0].content)
|
||||
print(tables[0].meta)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -100,7 +100,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -176,7 +176,7 @@ Here we evaluate retriever and reader in open domain fashion on the full corpus
|
||||
correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
|
||||
predicted answer string, regardless of which document this came from and the position of the extracted span.
|
||||
|
||||
The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
|
||||
The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
|
||||
### "Dense Passage Retrieval"
|
||||
|
||||
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
|
||||
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
|
||||
Original Abstract:
|
||||
|
||||
@ -145,7 +145,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever, Reader & Pipeline
|
||||
### Initialize Retriever, Reader & Pipeline
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -65,7 +65,7 @@ fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
|
||||
Haystack's converter classes are designed to help you turn files on your computer into the documents
|
||||
that can be processed by the Haystack pipeline.
|
||||
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
|
||||
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
For converting PDFs, try changing the encoding to UTF-8 if the conversion isn't great.
|
||||
|
||||
|
||||
|
@ -272,7 +272,7 @@ The DataFrames have the following schema:
|
||||
- context (answers only): the surrounding context of the answer within the document
|
||||
- exact_match (answers only): metric depicting if the answer exactly matches the gold label
|
||||
- f1 (answers only): metric depicting how well the answer overlaps with the gold label on token basis
|
||||
- sas (answers only, optional): metric depciting how well the answer matches the gold label on a semantic basis
|
||||
- sas (answers only, optional): metric depicting how well the answer matches the gold label on a semantic basis
|
||||
- gold_document_contents (documents only): the contents of the gold documents
|
||||
- content (documents only): the content of the document
|
||||
- gold_id_match (documents only): metric depicting whether one of the gold document ids matches the document
|
||||
@ -282,7 +282,7 @@ The DataFrames have the following schema:
|
||||
- document_id: the id of the document that has been retrieved or that contained the answer
|
||||
- gold_document_ids: the documents to be retrieved
|
||||
- offsets_in_document (answers only): the position or offsets within the document the answer was found
|
||||
- gold_offsets_in_documents (answers only): the positon or offsets of the gold answer within the document
|
||||
- gold_offsets_in_documents (answers only): the position or offsets of the gold answer within the document
|
||||
- type: 'answer' or 'document'
|
||||
- node: the node name
|
||||
- eval_mode: evaluation mode depicting whether the evaluation was executed in integrated or isolated mode.
|
||||
|
@ -141,7 +141,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -75,7 +75,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever and Reader/Generator
|
||||
### Initialize Retriever and Reader/Generator
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -137,7 +137,7 @@ print(tables[0].content)
|
||||
print(tables[0].meta)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -100,7 +100,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -176,7 +176,7 @@ Here we evaluate retriever and reader in open domain fashion on the full corpus
|
||||
correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
|
||||
predicted answer string, regardless of which document this came from and the position of the extracted span.
|
||||
|
||||
The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
|
||||
The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
|
||||
### "Dense Passage Retrieval"
|
||||
|
||||
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
|
||||
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
|
||||
Original Abstract:
|
||||
|
||||
@ -145,7 +145,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever, Reader & Pipeline
|
||||
### Initialize Retriever, Reader & Pipeline
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -65,7 +65,7 @@ fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
|
||||
Haystack's converter classes are designed to help you turn files on your computer into the documents
|
||||
that can be processed by the Haystack pipeline.
|
||||
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
|
||||
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
For converting PDFs, try changing the encoding to UTF-8 if the conversion isn't great.
|
||||
|
||||
|
||||
|
@ -301,7 +301,7 @@ The DataFrames have the following schema:
|
||||
- context (answers only): the surrounding context of the answer within the document
|
||||
- exact_match (answers only): metric depicting if the answer exactly matches the gold label
|
||||
- f1 (answers only): metric depicting how well the answer overlaps with the gold label on token basis
|
||||
- sas (answers only, optional): metric depciting how well the answer matches the gold label on a semantic basis
|
||||
- sas (answers only, optional): metric depicting how well the answer matches the gold label on a semantic basis
|
||||
- gold_document_contents (documents only): the contents of the gold documents
|
||||
- content (documents only): the content of the document
|
||||
- gold_id_match (documents only): metric depicting whether one of the gold document ids matches the document
|
||||
@ -311,7 +311,7 @@ The DataFrames have the following schema:
|
||||
- document_id: the id of the document that has been retrieved or that contained the answer
|
||||
- gold_document_ids: the documents to be retrieved
|
||||
- offsets_in_document (answers only): the position or offsets within the document the answer was found
|
||||
- gold_offsets_in_documents (answers only): the positon or offsets of the gold answer within the document
|
||||
- gold_offsets_in_documents (answers only): the position or offsets of the gold answer within the document
|
||||
- type: 'answer' or 'document'
|
||||
- node: the node name
|
||||
- eval_mode: evaluation mode depicting whether the evaluation was executed in integrated or isolated mode.
|
||||
|
@ -139,7 +139,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -76,7 +76,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever and Reader/Generator
|
||||
### Initialize Retriever and Reader/Generator
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -136,7 +136,7 @@ print(tables[0].content)
|
||||
print(tables[0].meta)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -98,7 +98,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -180,7 +180,7 @@ Here we evaluate retriever and reader in open domain fashion on the full corpus
|
||||
correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
|
||||
predicted answer string, regardless of which document this came from and the position of the extracted span.
|
||||
|
||||
The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
|
||||
The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
|
||||
### "Dense Passage Retrieval"
|
||||
|
||||
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
|
||||
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
|
||||
Original Abstract:
|
||||
|
||||
@ -143,7 +143,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever, Reader & Pipeline
|
||||
### Initialize Retriever, Reader & Pipeline
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -61,7 +61,7 @@ fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
|
||||
Haystack's converter classes are designed to help you turn files on your computer into the documents
|
||||
that can be processed by the Haystack pipeline.
|
||||
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
|
||||
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
For converting PDFs, try changing the encoding to UTF-8 if the conversion isn't great.
|
||||
|
||||
|
||||
|
@ -301,7 +301,7 @@ The DataFrames have the following schema:
|
||||
- context (answers only): the surrounding context of the answer within the document
|
||||
- exact_match (answers only): metric depicting if the answer exactly matches the gold label
|
||||
- f1 (answers only): metric depicting how well the answer overlaps with the gold label on token basis
|
||||
- sas (answers only, optional): metric depciting how well the answer matches the gold label on a semantic basis
|
||||
- sas (answers only, optional): metric depicting how well the answer matches the gold label on a semantic basis
|
||||
- gold_document_contents (documents only): the contents of the gold documents
|
||||
- content (documents only): the content of the document
|
||||
- gold_id_match (documents only): metric depicting whether one of the gold document ids matches the document
|
||||
@ -311,7 +311,7 @@ The DataFrames have the following schema:
|
||||
- document_id: the id of the document that has been retrieved or that contained the answer
|
||||
- gold_document_ids: the documents to be retrieved
|
||||
- offsets_in_document (answers only): the position or offsets within the document the answer was found
|
||||
- gold_offsets_in_documents (answers only): the positon or offsets of the gold answer within the document
|
||||
- gold_offsets_in_documents (answers only): the position or offsets of the gold answer within the document
|
||||
- type: 'answer' or 'document'
|
||||
- node: the node name
|
||||
- eval_mode: evaluation mode depicting whether the evaluation was executed in integrated or isolated mode.
|
||||
|
@ -139,7 +139,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -76,7 +76,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever and Reader/Generator
|
||||
### Initialize Retriever and Reader/Generator
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -137,7 +137,7 @@ print(tables[0].content)
|
||||
print(tables[0].meta)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -98,7 +98,7 @@ print(dicts[:3])
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -177,7 +177,7 @@ Here we evaluate retriever and reader in open domain fashion on the full corpus
|
||||
correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
|
||||
predicted answer string, regardless of which document this came from and the position of the extracted span.
|
||||
|
||||
The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
|
||||
The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
|
||||
### "Dense Passage Retrieval"
|
||||
|
||||
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
|
||||
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
|
||||
Original Abstract:
|
||||
|
||||
@ -147,7 +147,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
|
||||
document_store.write_documents(dicts)
|
||||
```
|
||||
|
||||
### Initalize Retriever, Reader & Pipeline
|
||||
### Initialize Retriever, Reader & Pipeline
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -63,7 +63,7 @@ fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
|
||||
Haystack's converter classes are designed to help you turn files on your computer into the documents
|
||||
that can be processed by the Haystack pipeline.
|
||||
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
|
||||
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
For converting PDFs, try changing the encoding to UTF-8 if the conversion isn't great.
|
||||
|
||||
|
||||
|
@ -373,7 +373,7 @@ E.g. you can call execute_eval_run() multiple times with different retrievers in
|
||||
|
||||
- `index_pipeline`: The indexing pipeline to use.
|
||||
- `query_pipeline`: The query pipeline to evaluate.
|
||||
- `evaluation_set_labels`: The labels to evaluate on forming an evalution set.
|
||||
- `evaluation_set_labels`: The labels to evaluate on forming an evaluation set.
|
||||
- `corpus_file_paths`: The files to be indexed and searched during evaluation forming a corpus.
|
||||
- `experiment_name`: The name of the experiment
|
||||
- `experiment_run_name`: The name of the experiment run
|
||||
|
@ -301,7 +301,7 @@ The DataFrames have the following schema:
|
||||
- context (answers only): the surrounding context of the answer within the document
|
||||
- exact_match (answers only): metric depicting if the answer exactly matches the gold label
|
||||
- f1 (answers only): metric depicting how well the answer overlaps with the gold label on token basis
|
||||
- sas (answers only, optional): metric depciting how well the answer matches the gold label on a semantic basis
|
||||
- sas (answers only, optional): metric depicting how well the answer matches the gold label on a semantic basis
|
||||
- gold_document_contents (documents only): the contents of the gold documents
|
||||
- content (documents only): the content of the document
|
||||
- gold_id_match (documents only): metric depicting whether one of the gold document ids matches the document
|
||||
@ -311,7 +311,7 @@ The DataFrames have the following schema:
|
||||
- document_id: the id of the document that has been retrieved or that contained the answer
|
||||
- gold_document_ids: the documents to be retrieved
|
||||
- offsets_in_document (answers only): the position or offsets within the document the answer was found
|
||||
- gold_offsets_in_documents (answers only): the positon or offsets of the gold answer within the document
|
||||
- gold_offsets_in_documents (answers only): the position or offsets of the gold answer within the document
|
||||
- type: 'answer' or 'document'
|
||||
- node: the node name
|
||||
- eval_mode: evaluation mode depicting whether the evaluation was executed in integrated or isolated mode.
|
||||
|
@ -139,7 +139,7 @@ print(docs[:3])
|
||||
document_store.write_documents(docs)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -76,7 +76,7 @@ docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split
|
||||
document_store.write_documents(docs)
|
||||
```
|
||||
|
||||
### Initalize Retriever and Reader/Generator
|
||||
### Initialize Retriever and Reader/Generator
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -130,7 +130,7 @@ print(tables[0].content)
|
||||
print(tables[0].meta)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader, & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -98,7 +98,7 @@ print(docs[:3])
|
||||
document_store.write_documents(docs)
|
||||
```
|
||||
|
||||
## Initalize Retriever, Reader & Pipeline
|
||||
## Initialize Retriever, Reader & Pipeline
|
||||
|
||||
### Retriever
|
||||
|
||||
|
@ -180,7 +180,7 @@ Here we evaluate retriever and reader in open domain fashion on the full corpus
|
||||
correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
|
||||
predicted answer string, regardless of which document this came from and the position of the extracted span.
|
||||
|
||||
The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
|
||||
The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
|
||||
### "Dense Passage Retrieval"
|
||||
|
||||
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
|
||||
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
|
||||
|
||||
Original Abstract:
|
||||
|
||||
@ -147,7 +147,7 @@ docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split
|
||||
document_store.write_documents(docs)
|
||||
```
|
||||
|
||||
### Initalize Retriever, Reader & Pipeline
|
||||
### Initialize Retriever, Reader & Pipeline
|
||||
|
||||
#### Retriever
|
||||
|
||||
|
@ -66,7 +66,7 @@ fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
|
||||
Haystack's converter classes are designed to help you turn files on your computer into the documents
|
||||
that can be processed by the Haystack pipeline.
|
||||
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
|
||||
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
|
||||
|
||||
```python
|
||||
|
@ -850,7 +850,7 @@ class Pipeline:
|
||||
|
||||
:param index_pipeline: The indexing pipeline to use.
|
||||
:param query_pipeline: The query pipeline to evaluate.
|
||||
:param evaluation_set_labels: The labels to evaluate on forming an evalution set.
|
||||
:param evaluation_set_labels: The labels to evaluate on forming an evaluation set.
|
||||
:param corpus_file_paths: The files to be indexed and searched during evaluation forming a corpus.
|
||||
:param experiment_name: The name of the experiment
|
||||
:param experiment_run_name: The name of the experiment run
|
||||
|
@ -1858,7 +1858,7 @@ def test_DeepsetCloudDocumentStore_fetches_labels_for_evaluation_set(deepset_clo
|
||||
|
||||
|
||||
@responses.activate
|
||||
def test_DeepsetCloudDocumentStore_fetches_lables_for_evaluation_set_raises_deepsetclouderror_when_nothing_found(
|
||||
def test_DeepsetCloudDocumentStore_fetches_labels_for_evaluation_set_raises_deepsetclouderror_when_nothing_found(
|
||||
deepset_cloud_document_store,
|
||||
):
|
||||
if MOCK_DC:
|
||||
|
@ -142,7 +142,7 @@
|
||||
"id": "wgjedxx_A6N6"
|
||||
},
|
||||
"source": [
|
||||
"### Initalize Retriever and Reader/Generator\n",
|
||||
"### Initialize Retriever and Reader/Generator\n",
|
||||
"\n",
|
||||
"#### Retriever\n",
|
||||
"\n",
|
||||
|
@ -36,7 +36,7 @@ def tutorial12_lfqa():
|
||||
document_store.write_documents(docs)
|
||||
|
||||
"""
|
||||
Initalize Retriever and Reader/Generator:
|
||||
Initialize Retriever and Reader/Generator:
|
||||
We use a `DensePassageRetriever` and we invoke `update_embeddings` to index the embeddings of documents in the `FAISSDocumentStore`
|
||||
"""
|
||||
|
||||
|
@ -231,7 +231,7 @@
|
||||
"id": "hmQC1sDmw3d7"
|
||||
},
|
||||
"source": [
|
||||
"## Initalize Retriever, Reader, & Pipeline\n",
|
||||
"## Initialize Retriever, Reader & Pipeline\n",
|
||||
"\n",
|
||||
"### Retriever\n",
|
||||
"\n",
|
||||
|
@ -202,7 +202,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initalize Retriever, Reader, & Pipeline\n",
|
||||
"## Initialize Retriever, Reader & Pipeline\n",
|
||||
"\n",
|
||||
"### Retriever\n",
|
||||
"\n",
|
||||
|
@ -65,7 +65,7 @@ def tutorial1_basic_qa_pipeline():
|
||||
# Now, let's write the docs to our DB.
|
||||
document_store.write_documents(docs)
|
||||
|
||||
# ## Initalize Retriever & Reader
|
||||
# ## Initialize Retriever & Reader
|
||||
#
|
||||
# ### Retriever
|
||||
#
|
||||
|
@ -166,7 +166,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initalize Retriever, Reader & Pipeline\n",
|
||||
"## Initialize Retriever, Reader & Pipeline\n",
|
||||
"\n",
|
||||
"### Retriever\n",
|
||||
"\n",
|
||||
|
@ -42,7 +42,7 @@ def tutorial3_basic_qa_pipeline_without_elasticsearch():
|
||||
# Now, let's write the docs to our DB.
|
||||
document_store.write_documents(docs)
|
||||
|
||||
# ## Initalize Retriever, Reader & Pipeline
|
||||
# ## Initialize Retriever, Reader & Pipeline
|
||||
#
|
||||
# ### Retriever
|
||||
#
|
||||
|
@ -389,7 +389,7 @@
|
||||
"correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the\n",
|
||||
"predicted answer string, regardless of which document this came from and the position of the extracted span.\n",
|
||||
"\n",
|
||||
"The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.\n"
|
||||
"The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -95,7 +95,7 @@ def tutorial5_evaluation():
|
||||
# i.e. a document is considered
|
||||
# correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
|
||||
# predicted answer string, regardless of which document this came from and the position of the extracted span.
|
||||
# The generation of predictions is seperated from the calculation of metrics.
|
||||
# The generation of predictions is separated from the calculation of metrics.
|
||||
# This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
|
||||
|
||||
pipeline = ExtractiveQAPipeline(reader=reader, retriever=retriever)
|
||||
|
@ -45,7 +45,7 @@
|
||||
"### \"Dense Passage Retrieval\"\n",
|
||||
"\n",
|
||||
"In this Tutorial, we want to highlight one \"Dense Dual-Encoder\" called Dense Passage Retriever. \n",
|
||||
"It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906. \n",
|
||||
"It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906. \n",
|
||||
"\n",
|
||||
"Original Abstract: \n",
|
||||
"\n",
|
||||
@ -262,7 +262,7 @@
|
||||
"id": "wgjedxx_A6N6"
|
||||
},
|
||||
"source": [
|
||||
"### Initalize Retriever, Reader & Pipeline\n",
|
||||
"### Initialize Retriever, Reader & Pipeline\n",
|
||||
"\n",
|
||||
"#### Retriever\n",
|
||||
"\n",
|
||||
|
@ -150,7 +150,7 @@
|
||||
"Haystack's converter classes are designed to help you turn files on your computer into the documents\n",
|
||||
"that can be processed by the Haystack pipeline.\n",
|
||||
"There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.\n",
|
||||
"The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected."
|
||||
"The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -37,7 +37,7 @@ def tutorial8_preprocessing():
|
||||
Haystack's converter classes are designed to help you turn files on your computer into the documents
|
||||
that can be processed by the Haystack pipeline.
|
||||
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
|
||||
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
|
||||
"""
|
||||
|
||||
# Here are some examples of how you would use file converters
|
||||
|
@ -34,9 +34,9 @@ The evaluation mode leverages the feedback REST API endpoint of haystack. The us
|
||||
|
||||
In order to use the UI in evaluation mode, you need an ElasticSearch instance with pre-indexed files and the Haystack REST API. You can set the environment up via docker images. For ElasticSearch, you can check out our [documentation](https://haystack.deepset.ai/usage/document-store#initialisation) and for setting up the REST API this [link](https://github.com/deepset-ai/haystack/blob/master/README.md#7-rest-api).
|
||||
|
||||
To enter the evaluation mode, select the checkbox "Evaluation mode" in the sidebar. The UI will load the predefined questions from the file [`eval_lables_examles`](https://raw.githubusercontent.com/deepset-ai/haystack/master/ui/eval_labels_example.csv). The file needs to be prefilled with your data. This way, the user will get a random question from the set and can give his feedback with the buttons below the questions. To load a new question, click the button "Get random question".
|
||||
To enter the evaluation mode, select the checkbox "Evaluation mode" in the sidebar. The UI will load the predefined questions from the file [`eval_labels_examples`](https://raw.githubusercontent.com/deepset-ai/haystack/master/ui/eval_labels_example.csv). The file needs to be prefilled with your data. This way, the user will get a random question from the set and can give his feedback with the buttons below the questions. To load a new question, click the button "Get random question".
|
||||
|
||||
The file just needs to have two columns separated by semicolon. You can add more columns but the UI will ignore them. Every line represents a questions answer pair. The columns with the questions needs to be named “Question Text” and the answer column “Answer” so that they can be loaded correctly. Currently, the easiest way to create the file is manully by adding question answer pairs.
|
||||
The file just needs to have two columns separated by semicolon. You can add more columns but the UI will ignore them. Every line represents a questions answer pair. The columns with the questions needs to be named “Question Text” and the answer column “Answer” so that they can be loaded correctly. Currently, the easiest way to create the file is manually by adding question answer pairs.
|
||||
|
||||
The feedback can be exported with the API endpoint `export-doc-qa-feedback`. To learn more about finetuning a model with user feedback, please check out our [docs](https://haystack.deepset.ai/usage/domain-adaptation#user-feedback).
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user