Improve Docs Readability (#2617)

Signed-off-by: Ryan Russell <git@ryanrussell.org>
This commit is contained in:
Ryan Russell 2022-06-03 02:57:40 -05:00 committed by GitHub
parent 3c6fcc3e42
commit c1b7948e10
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
87 changed files with 107 additions and 107 deletions

View File

@ -425,7 +425,7 @@ E.g. you can call execute_eval_run() multiple times with different retrievers in
- `index_pipeline`: The indexing pipeline to use.
- `query_pipeline`: The query pipeline to evaluate.
- `evaluation_set_labels`: The labels to evaluate on forming an evalution set.
- `evaluation_set_labels`: The labels to evaluate on forming an evaluation set.
- `corpus_file_paths`: The files to be indexed and searched during evaluation forming a corpus.
- `experiment_name`: The name of the experiment
- `experiment_run_name`: The name of the experiment run

View File

@ -139,7 +139,7 @@ print(docs[:3])
document_store.write_documents(docs)
```
## Initalize Retriever, Reader, & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -76,7 +76,7 @@ docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split
document_store.write_documents(docs)
```
### Initalize Retriever and Reader/Generator
### Initialize Retriever and Reader/Generator
#### Retriever

View File

@ -130,7 +130,7 @@ print(tables[0].content)
print(tables[0].meta)
```
## Initalize Retriever, Reader, & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -98,7 +98,7 @@ print(docs[:3])
document_store.write_documents(docs)
```
## Initalize Retriever, Reader & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -180,7 +180,7 @@ Here we evaluate retriever and reader in open domain fashion on the full corpus
correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
predicted answer string, regardless of which document this came from and the position of the extracted span.
The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.

View File

@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
Original Abstract:
@ -147,7 +147,7 @@ docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split
document_store.write_documents(docs)
```
### Initalize Retriever, Reader & Pipeline
### Initialize Retriever, Reader & Pipeline
#### Retriever

View File

@ -66,7 +66,7 @@ fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
Haystack's converter classes are designed to help you turn files on your computer into the documents
that can be processed by the Haystack pipeline.
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
```python

View File

@ -142,7 +142,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -73,7 +73,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever and Reader/Generator
### Initialize Retriever and Reader/Generator
#### Retriever

View File

@ -101,7 +101,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
Original Abstract:
@ -145,7 +145,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever, Reader & Pipeline
### Initialize Retriever, Reader & Pipeline
#### Retriever

View File

@ -127,7 +127,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Finder
## Initialize Retriever, Reader & Finder
### Retriever

View File

@ -87,7 +87,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Finder
## Initialize Retriever, Reader & Finder
### Retriever

View File

@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
Original Abstract:
@ -124,7 +124,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever, Reader, & Finder
### Initialize Retriever, Reader & Finder
#### Retriever

View File

@ -127,7 +127,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Finder
## Initialize Retriever, Reader & Finder
### Retriever

View File

@ -87,7 +87,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Finder
## Initialize Retriever, Reader & Finder
### Retriever

View File

@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
Original Abstract:
@ -124,7 +124,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever, Reader, & Finder
### Initialize Retriever, Reader & Finder
#### Retriever

View File

@ -129,7 +129,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Finder
## Initialize Retriever, Reader & Finder
### Retriever

View File

@ -89,7 +89,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Finder
## Initialize Retriever, Reader & Finder
### Retriever

View File

@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
Original Abstract:
@ -129,7 +129,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever, Reader, & Finder
### Initialize Retriever, Reader & Finder
#### Retriever

View File

@ -142,7 +142,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Finder
## Initialize Retriever, Reader & Finder
### Retriever

View File

@ -103,7 +103,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Finder
## Initialize Retriever, Reader & Finder
### Retriever

View File

@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
Original Abstract:
@ -129,7 +129,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever, Reader, & Finder
### Initialize Retriever, Reader & Finder
#### Retriever

View File

@ -140,7 +140,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Finder
## Initialize Retriever, Reader & Finder
### Retriever

View File

@ -101,7 +101,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Finder
## Initialize Retriever, Reader & Finder
### Retriever

View File

@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
Original Abstract:
@ -127,7 +127,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever, Reader, & Finder
### Initialize Retriever, Reader & Finder
#### Retriever

View File

@ -142,7 +142,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Finder
## Initialize Retriever, Reader & Finder
### Retriever

View File

@ -73,7 +73,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever and Reader/Generator
### Initialize Retriever and Reader/Generator
#### Retriever

View File

@ -102,7 +102,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Finder
## Initialize Retriever, Reader & Finder
### Retriever

View File

@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
Original Abstract:
@ -146,7 +146,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever, Reader, & Finder
### Initialize Retriever, Reader & Finder
#### Retriever

View File

@ -280,10 +280,10 @@ The DataFrames have the following schema:
- context (answers only): the surrounding context of the answer within the document
- offsets_in_document (answers only): the position or offsets within the document the answer was found
- gold_answers (answers only): the answers to be given
- gold_offsets_in_documents (answers only): the positon or offsets of the gold answer within the document
- gold_offsets_in_documents (answers only): the position or offsets of the gold answer within the document
- exact_match (answers only): metric depicting if the answer exactly matches the gold label
- f1 (answers only): metric depicting how well the answer overlaps with the gold label on token basis
- sas (answers only, optional): metric depciting how well the answer matches the gold label on a semantic basis
- sas (answers only, optional): metric depicting how well the answer matches the gold label on a semantic basis
**Arguments**:

View File

@ -141,7 +141,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -75,7 +75,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever and Reader/Generator
### Initialize Retriever and Reader/Generator
#### Retriever

View File

@ -136,7 +136,7 @@ print(tables[0].content)
print(tables[0].meta)
```
## Initalize Retriever, Reader, & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -100,7 +100,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -176,7 +176,7 @@ Here we evaluate retriever and reader in open domain fashion on the full corpus
correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
predicted answer string, regardless of which document this came from and the position of the extracted span.
The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.

View File

@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
Original Abstract:
@ -145,7 +145,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever, Reader & Pipeline
### Initialize Retriever, Reader & Pipeline
#### Retriever

View File

@ -65,7 +65,7 @@ fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
Haystack's converter classes are designed to help you turn files on your computer into the documents
that can be processed by the Haystack pipeline.
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
For converting PDFs, try changing the encoding to UTF-8 if the conversion isn't great.

View File

@ -272,7 +272,7 @@ The DataFrames have the following schema:
- context (answers only): the surrounding context of the answer within the document
- exact_match (answers only): metric depicting if the answer exactly matches the gold label
- f1 (answers only): metric depicting how well the answer overlaps with the gold label on token basis
- sas (answers only, optional): metric depciting how well the answer matches the gold label on a semantic basis
- sas (answers only, optional): metric depicting how well the answer matches the gold label on a semantic basis
- gold_document_contents (documents only): the contents of the gold documents
- content (documents only): the content of the document
- gold_id_match (documents only): metric depicting whether one of the gold document ids matches the document
@ -282,7 +282,7 @@ The DataFrames have the following schema:
- document_id: the id of the document that has been retrieved or that contained the answer
- gold_document_ids: the documents to be retrieved
- offsets_in_document (answers only): the position or offsets within the document the answer was found
- gold_offsets_in_documents (answers only): the positon or offsets of the gold answer within the document
- gold_offsets_in_documents (answers only): the position or offsets of the gold answer within the document
- type: 'answer' or 'document'
- node: the node name
- eval_mode: evaluation mode depicting whether the evaluation was executed in integrated or isolated mode.

View File

@ -141,7 +141,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -75,7 +75,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever and Reader/Generator
### Initialize Retriever and Reader/Generator
#### Retriever

View File

@ -137,7 +137,7 @@ print(tables[0].content)
print(tables[0].meta)
```
## Initalize Retriever, Reader, & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -100,7 +100,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -176,7 +176,7 @@ Here we evaluate retriever and reader in open domain fashion on the full corpus
correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
predicted answer string, regardless of which document this came from and the position of the extracted span.
The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.

View File

@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
Original Abstract:
@ -145,7 +145,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever, Reader & Pipeline
### Initialize Retriever, Reader & Pipeline
#### Retriever

View File

@ -65,7 +65,7 @@ fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
Haystack's converter classes are designed to help you turn files on your computer into the documents
that can be processed by the Haystack pipeline.
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
For converting PDFs, try changing the encoding to UTF-8 if the conversion isn't great.

View File

@ -301,7 +301,7 @@ The DataFrames have the following schema:
- context (answers only): the surrounding context of the answer within the document
- exact_match (answers only): metric depicting if the answer exactly matches the gold label
- f1 (answers only): metric depicting how well the answer overlaps with the gold label on token basis
- sas (answers only, optional): metric depciting how well the answer matches the gold label on a semantic basis
- sas (answers only, optional): metric depicting how well the answer matches the gold label on a semantic basis
- gold_document_contents (documents only): the contents of the gold documents
- content (documents only): the content of the document
- gold_id_match (documents only): metric depicting whether one of the gold document ids matches the document
@ -311,7 +311,7 @@ The DataFrames have the following schema:
- document_id: the id of the document that has been retrieved or that contained the answer
- gold_document_ids: the documents to be retrieved
- offsets_in_document (answers only): the position or offsets within the document the answer was found
- gold_offsets_in_documents (answers only): the positon or offsets of the gold answer within the document
- gold_offsets_in_documents (answers only): the position or offsets of the gold answer within the document
- type: 'answer' or 'document'
- node: the node name
- eval_mode: evaluation mode depicting whether the evaluation was executed in integrated or isolated mode.

View File

@ -139,7 +139,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -76,7 +76,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever and Reader/Generator
### Initialize Retriever and Reader/Generator
#### Retriever

View File

@ -136,7 +136,7 @@ print(tables[0].content)
print(tables[0].meta)
```
## Initalize Retriever, Reader, & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -98,7 +98,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -180,7 +180,7 @@ Here we evaluate retriever and reader in open domain fashion on the full corpus
correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
predicted answer string, regardless of which document this came from and the position of the extracted span.
The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.

View File

@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
Original Abstract:
@ -143,7 +143,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever, Reader & Pipeline
### Initialize Retriever, Reader & Pipeline
#### Retriever

View File

@ -61,7 +61,7 @@ fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
Haystack's converter classes are designed to help you turn files on your computer into the documents
that can be processed by the Haystack pipeline.
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
For converting PDFs, try changing the encoding to UTF-8 if the conversion isn't great.

View File

@ -301,7 +301,7 @@ The DataFrames have the following schema:
- context (answers only): the surrounding context of the answer within the document
- exact_match (answers only): metric depicting if the answer exactly matches the gold label
- f1 (answers only): metric depicting how well the answer overlaps with the gold label on token basis
- sas (answers only, optional): metric depciting how well the answer matches the gold label on a semantic basis
- sas (answers only, optional): metric depicting how well the answer matches the gold label on a semantic basis
- gold_document_contents (documents only): the contents of the gold documents
- content (documents only): the content of the document
- gold_id_match (documents only): metric depicting whether one of the gold document ids matches the document
@ -311,7 +311,7 @@ The DataFrames have the following schema:
- document_id: the id of the document that has been retrieved or that contained the answer
- gold_document_ids: the documents to be retrieved
- offsets_in_document (answers only): the position or offsets within the document the answer was found
- gold_offsets_in_documents (answers only): the positon or offsets of the gold answer within the document
- gold_offsets_in_documents (answers only): the position or offsets of the gold answer within the document
- type: 'answer' or 'document'
- node: the node name
- eval_mode: evaluation mode depicting whether the evaluation was executed in integrated or isolated mode.

View File

@ -139,7 +139,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader, & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -76,7 +76,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever and Reader/Generator
### Initialize Retriever and Reader/Generator
#### Retriever

View File

@ -137,7 +137,7 @@ print(tables[0].content)
print(tables[0].meta)
```
## Initalize Retriever, Reader, & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -98,7 +98,7 @@ print(dicts[:3])
document_store.write_documents(dicts)
```
## Initalize Retriever, Reader & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -177,7 +177,7 @@ Here we evaluate retriever and reader in open domain fashion on the full corpus
correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
predicted answer string, regardless of which document this came from and the position of the extracted span.
The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.

View File

@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
Original Abstract:
@ -147,7 +147,7 @@ dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, spl
document_store.write_documents(dicts)
```
### Initalize Retriever, Reader & Pipeline
### Initialize Retriever, Reader & Pipeline
#### Retriever

View File

@ -63,7 +63,7 @@ fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
Haystack's converter classes are designed to help you turn files on your computer into the documents
that can be processed by the Haystack pipeline.
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
For converting PDFs, try changing the encoding to UTF-8 if the conversion isn't great.

View File

@ -373,7 +373,7 @@ E.g. you can call execute_eval_run() multiple times with different retrievers in
- `index_pipeline`: The indexing pipeline to use.
- `query_pipeline`: The query pipeline to evaluate.
- `evaluation_set_labels`: The labels to evaluate on forming an evalution set.
- `evaluation_set_labels`: The labels to evaluate on forming an evaluation set.
- `corpus_file_paths`: The files to be indexed and searched during evaluation forming a corpus.
- `experiment_name`: The name of the experiment
- `experiment_run_name`: The name of the experiment run

View File

@ -301,7 +301,7 @@ The DataFrames have the following schema:
- context (answers only): the surrounding context of the answer within the document
- exact_match (answers only): metric depicting if the answer exactly matches the gold label
- f1 (answers only): metric depicting how well the answer overlaps with the gold label on token basis
- sas (answers only, optional): metric depciting how well the answer matches the gold label on a semantic basis
- sas (answers only, optional): metric depicting how well the answer matches the gold label on a semantic basis
- gold_document_contents (documents only): the contents of the gold documents
- content (documents only): the content of the document
- gold_id_match (documents only): metric depicting whether one of the gold document ids matches the document
@ -311,7 +311,7 @@ The DataFrames have the following schema:
- document_id: the id of the document that has been retrieved or that contained the answer
- gold_document_ids: the documents to be retrieved
- offsets_in_document (answers only): the position or offsets within the document the answer was found
- gold_offsets_in_documents (answers only): the positon or offsets of the gold answer within the document
- gold_offsets_in_documents (answers only): the position or offsets of the gold answer within the document
- type: 'answer' or 'document'
- node: the node name
- eval_mode: evaluation mode depicting whether the evaluation was executed in integrated or isolated mode.

View File

@ -139,7 +139,7 @@ print(docs[:3])
document_store.write_documents(docs)
```
## Initalize Retriever, Reader, & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -76,7 +76,7 @@ docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split
document_store.write_documents(docs)
```
### Initalize Retriever and Reader/Generator
### Initialize Retriever and Reader/Generator
#### Retriever

View File

@ -130,7 +130,7 @@ print(tables[0].content)
print(tables[0].meta)
```
## Initalize Retriever, Reader, & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -98,7 +98,7 @@ print(docs[:3])
document_store.write_documents(docs)
```
## Initalize Retriever, Reader & Pipeline
## Initialize Retriever, Reader & Pipeline
### Retriever

View File

@ -180,7 +180,7 @@ Here we evaluate retriever and reader in open domain fashion on the full corpus
correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
predicted answer string, regardless of which document this came from and the position of the extracted span.
The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.

View File

@ -45,7 +45,7 @@ Recent work suggests that dual encoders work better, likely because they can dea
### "Dense Passage Retrieval"
In this Tutorial, we want to highlight one "Dense Dual-Encoder" called Dense Passage Retriever.
It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906.
Original Abstract:
@ -147,7 +147,7 @@ docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split
document_store.write_documents(docs)
```
### Initalize Retriever, Reader & Pipeline
### Initialize Retriever, Reader & Pipeline
#### Retriever

View File

@ -66,7 +66,7 @@ fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
Haystack's converter classes are designed to help you turn files on your computer into the documents
that can be processed by the Haystack pipeline.
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
```python

View File

@ -850,7 +850,7 @@ class Pipeline:
:param index_pipeline: The indexing pipeline to use.
:param query_pipeline: The query pipeline to evaluate.
:param evaluation_set_labels: The labels to evaluate on forming an evalution set.
:param evaluation_set_labels: The labels to evaluate on forming an evaluation set.
:param corpus_file_paths: The files to be indexed and searched during evaluation forming a corpus.
:param experiment_name: The name of the experiment
:param experiment_run_name: The name of the experiment run

View File

@ -1858,7 +1858,7 @@ def test_DeepsetCloudDocumentStore_fetches_labels_for_evaluation_set(deepset_clo
@responses.activate
def test_DeepsetCloudDocumentStore_fetches_lables_for_evaluation_set_raises_deepsetclouderror_when_nothing_found(
def test_DeepsetCloudDocumentStore_fetches_labels_for_evaluation_set_raises_deepsetclouderror_when_nothing_found(
deepset_cloud_document_store,
):
if MOCK_DC:

View File

@ -142,7 +142,7 @@
"id": "wgjedxx_A6N6"
},
"source": [
"### Initalize Retriever and Reader/Generator\n",
"### Initialize Retriever and Reader/Generator\n",
"\n",
"#### Retriever\n",
"\n",

View File

@ -36,7 +36,7 @@ def tutorial12_lfqa():
document_store.write_documents(docs)
"""
Initalize Retriever and Reader/Generator:
Initialize Retriever and Reader/Generator:
We use a `DensePassageRetriever` and we invoke `update_embeddings` to index the embeddings of documents in the `FAISSDocumentStore`
"""

View File

@ -231,7 +231,7 @@
"id": "hmQC1sDmw3d7"
},
"source": [
"## Initalize Retriever, Reader, & Pipeline\n",
"## Initialize Retriever, Reader & Pipeline\n",
"\n",
"### Retriever\n",
"\n",

View File

@ -202,7 +202,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initalize Retriever, Reader, & Pipeline\n",
"## Initialize Retriever, Reader & Pipeline\n",
"\n",
"### Retriever\n",
"\n",

View File

@ -65,7 +65,7 @@ def tutorial1_basic_qa_pipeline():
# Now, let's write the docs to our DB.
document_store.write_documents(docs)
# ## Initalize Retriever & Reader
# ## Initialize Retriever & Reader
#
# ### Retriever
#

View File

@ -166,7 +166,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initalize Retriever, Reader & Pipeline\n",
"## Initialize Retriever, Reader & Pipeline\n",
"\n",
"### Retriever\n",
"\n",

View File

@ -42,7 +42,7 @@ def tutorial3_basic_qa_pipeline_without_elasticsearch():
# Now, let's write the docs to our DB.
document_store.write_documents(docs)
# ## Initalize Retriever, Reader & Pipeline
# ## Initialize Retriever, Reader & Pipeline
#
# ### Retriever
#

View File

@ -389,7 +389,7 @@
"correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the\n",
"predicted answer string, regardless of which document this came from and the position of the extracted span.\n",
"\n",
"The generation of predictions is seperated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.\n"
"The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.\n"
]
},
{

View File

@ -95,7 +95,7 @@ def tutorial5_evaluation():
# i.e. a document is considered
# correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
# predicted answer string, regardless of which document this came from and the position of the extracted span.
# The generation of predictions is seperated from the calculation of metrics.
# The generation of predictions is separated from the calculation of metrics.
# This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.
pipeline = ExtractiveQAPipeline(reader=reader, retriever=retriever)

View File

@ -45,7 +45,7 @@
"### \"Dense Passage Retrieval\"\n",
"\n",
"In this Tutorial, we want to highlight one \"Dense Dual-Encoder\" called Dense Passage Retriever. \n",
"It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906. \n",
"It was introduced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906. \n",
"\n",
"Original Abstract: \n",
"\n",
@ -262,7 +262,7 @@
"id": "wgjedxx_A6N6"
},
"source": [
"### Initalize Retriever, Reader & Pipeline\n",
"### Initialize Retriever, Reader & Pipeline\n",
"\n",
"#### Retriever\n",
"\n",

View File

@ -150,7 +150,7 @@
"Haystack's converter classes are designed to help you turn files on your computer into the documents\n",
"that can be processed by the Haystack pipeline.\n",
"There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.\n",
"The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected."
"The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected."
]
},
{

View File

@ -37,7 +37,7 @@ def tutorial8_preprocessing():
Haystack's converter classes are designed to help you turn files on your computer into the documents
that can be processed by the Haystack pipeline.
There are file converters for txt, pdf, docx files as well as a converter that is powered by Apache Tika.
The parameter `valid_langugages` does not convert files to the target language, but checks if the conversion worked as expected.
The parameter `valid_languages` does not convert files to the target language, but checks if the conversion worked as expected.
"""
# Here are some examples of how you would use file converters

View File

@ -34,9 +34,9 @@ The evaluation mode leverages the feedback REST API endpoint of haystack. The us
In order to use the UI in evaluation mode, you need an ElasticSearch instance with pre-indexed files and the Haystack REST API. You can set the environment up via docker images. For ElasticSearch, you can check out our [documentation](https://haystack.deepset.ai/usage/document-store#initialisation) and for setting up the REST API this [link](https://github.com/deepset-ai/haystack/blob/master/README.md#7-rest-api).
To enter the evaluation mode, select the checkbox "Evaluation mode" in the sidebar. The UI will load the predefined questions from the file [`eval_lables_examles`](https://raw.githubusercontent.com/deepset-ai/haystack/master/ui/eval_labels_example.csv). The file needs to be prefilled with your data. This way, the user will get a random question from the set and can give his feedback with the buttons below the questions. To load a new question, click the button "Get random question".
To enter the evaluation mode, select the checkbox "Evaluation mode" in the sidebar. The UI will load the predefined questions from the file [`eval_labels_examples`](https://raw.githubusercontent.com/deepset-ai/haystack/master/ui/eval_labels_example.csv). The file needs to be prefilled with your data. This way, the user will get a random question from the set and can give his feedback with the buttons below the questions. To load a new question, click the button "Get random question".
The file just needs to have two columns separated by semicolon. You can add more columns but the UI will ignore them. Every line represents a questions answer pair. The columns with the questions needs to be named “Question Text” and the answer column “Answer” so that they can be loaded correctly. Currently, the easiest way to create the file is manully by adding question answer pairs.
The file just needs to have two columns separated by semicolon. You can add more columns but the UI will ignore them. Every line represents a questions answer pair. The columns with the questions needs to be named “Question Text” and the answer column “Answer” so that they can be loaded correctly. Currently, the easiest way to create the file is manually by adding question answer pairs.
The feedback can be exported with the API endpoint `export-doc-qa-feedback`. To learn more about finetuning a model with user feedback, please check out our [docs](https://haystack.deepset.ai/usage/domain-adaptation#user-feedback).