426 Commits

Author SHA1 Message Date
MichelBartels
e8cd5ea943
Add distillation to finetuning tutorial (#2025)
* Add finetuning tutorial

* Add latest docstring and tutorial changes

* fix typo

* Add latest docstring and tutorial changes

* improve distillation explanation in finetuning tutorial

* Add latest docstring and tutorial changes

* allow augment_squad.py to be easier to call from within python

* Update Tutorial2_Finetune_a_model_on_your_data.py

* fix squad augmentation test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-20 12:18:32 +01:00
MichelBartels
0cca2b97cd
distinguish intermediate layer & prediction layer distillation phases with different parameters (#2001)
* add parameters to allow for different hyperparameters in stage 1 and 2 of tinybert distillation

* Add latest docstring and tutorial changes

* improve default parameters

* Add latest docstring and tutorial changes

* split up distillation method

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 20:40:38 +01:00
tstadel
f42d2e8ba0
Add nDCG to pipeline.eval()'s document metrics (#2008)
* add ndcg metric

* fix merge

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 18:36:41 +01:00
Julian Risch
2c063e960e
Extend Tutorial 5 with Upper Bound Reader Eval Metrics (#1995)
* print report for closed-domain eval

* Add latest docstring and tutorial changes

* rename parameter and rewrite docs

* Add latest docstring and tutorial changes

* print eval report in separate cell

* Add latest docstring and tutorial changes

* explain when to eval individual components

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 16:29:18 +01:00
Julian Risch
5695d721aa
update link to annotation tool docu (#2005)
* update link to annotation tool docu

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 16:10:59 +01:00
Julian Risch
a3147cae47
Add isolated node eval mode in pipeline eval (#1962)
* run predictions on ground-truth docs in reader

* build dataframe for closed/open domain eval

* fix looping through multilabel

* fix looping through multilabel's list of labels

* simplify collecting relevant docs

* switch closed-domain eval off by default

* Add latest docstring and tutorial changes

* handle edge case params not given

* renaming & generate pipeline eval report

* add test case for closed-domain eval metrics

* Add latest docstring and tutorial changes

* test  report of closed-domain eval

* report closed-domain metrics only for answer metrics not doc metrics

* refactoring

* fix mypy & remove comment

* add second for-loop & use answer as method input

* renaming & add separate loop building docs eval df

* Add latest docstring and tutorial changes

* source /home/tstad/miniconda3/bin/activatechange column order for evaluatation dataframe (#1957)
conda activate haystack-dev2

* change column order for evaluatation dataframe

* added missing eval column node_input

* generic order for both document and answer returning nodes; ensure no columns get lost

Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>

* fix column reordering after renaming of node_input

* simplify tests &  add docu

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ju-gu <87523290+ju-gu@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-01-14 14:37:16 +01:00
Sara Zan
e28bf618d7
Implement proper FK in MetaDocumentORM and MetaLabelORM to work on PostgreSQL (#1990)
* Properly fix MetaDocumentORM and MetaLabelORM with composite foreign key constraints

* update_document_meta() was not using index properly

* Exclude ES and Memory from the cosine_sanity_check test

* move ensure_ids_are_correct_uuids in conftest and move one test back to faiss & milvus suite

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 13:48:58 +01:00
MichelBartels
3e4dbbb32c
Align similarity scores across document stores (#1967)
* align document store similarity functions

* remove unnecessary imports

* undone accidental change

* stopped weaviate from pretending to support dot product similarity

* stopped weaviate from pretending to support dot product similarity

* Add latest docstring and tutorial changes

* fix fixture params for document stores

* use cosine similarity for most tests

* fix cosine similarity test

* fix faiss test

* fix weaviate test

* fix accidental deletion

* fix document_store fixture

* test fix; shouldn't be merged

* fix test_normalize_embeddings_diff_shapes

* probably a better fix

* fix for parameter combinations

* revert new pytest_generate_tests functionality

* simplify pytest_generate_tests

* normalize embeddings for test_dpr_embedding

* add to faiss doc that embeddings are normalized

* Add latest docstring and tutorial changes

* remove unnecessary parameters and add comments

* simplify two lines of memory.py into one

* test similarity scores with smaller language model

* fix test_similarity_score


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-12 19:28:20 +01:00
Dmitry Goryunov
79fdda8a7c
Remove hard-coded variables from the Tutorial 15 (#1984)
* Remove hard-coded variables from the Tutorial 15

* Fix missing comma

* Add latest docstring and tutorial changes

* Fix formatting in Tutorial15_TableQA.ipynb

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-11 17:55:20 +01:00
Mathew Kuriakose
a44b6c18c0
Unify vector_dim and embedding_dim parameter in Document Store (#1922)
* Refactored code to unify vector_dim and embedding_dim parameter in DocumentStores

* Unit test cases updated to use `embedding_dim` instead of `vector_dim`

* Unit test case update to use embedding_dim instead of vector_dim

* Add latest docstring and tutorial changes

* Put usage of `vector_dim` param in same if-block as corresponding warning

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-01-10 18:10:32 +01:00
Julian Risch
30ea1d475d
check multiprocessing sharing strategy is available (#1965)
* check multiprocessing sharing strategy is available

* Change default of multiprocessing strategy to None

* Change default sharing strategy to None in retriever

* Add latest docstring and tutorial changes

* Make logging message easier to understand

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-05 18:22:09 +01:00
oryx1729
2910f67718
Use long Commit ID for Docker tags (#1946) 2022-01-04 17:39:49 +01:00
Alon Eirew
7a4fa42fda
Fix #1927 - RuntimeError when loading data using data_silo due to many open file descriptors from multiprocessing (#1928)
* fix #1687

* fix RuntimeError: received 0 items of ancdata

* Add an arg multiprocessing_strategy to DataSilo and DPR.train()

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-04 13:29:26 +01:00
bogdankostic
3e0ef1cc8a
Fix Numba TypingError in normalize_embedding for cosine similarity (#1933)
* Fix Numba TypingError

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-03 17:14:51 +01:00
bogdankostic
45df18c416
Add RCIReader for TableQA (#1909)
* Add RCIReader

* Add latest docstring and tutorial changes

* Add Doc Strings

* Add latest docstring and tutorial changes

* Add Tests

* Add Doc Strings

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-03 16:59:24 +01:00
Kristof Herrmann
6e8e3c68d9
Custom id hashing on documentstore level (#1910)
* adding dynamic id hashing

* Add latest docstring and tutorial changes

* added pr review

* Add latest docstring and tutorial changes

* fixed tests

* fix mypy error

* fix mypy issue

* ignore typing

* fixed correct check

* fixed tests

* try fixing the tests

* set id hash keys only if not none

* dont store id_hash_keys

* fix tests

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-03 16:58:19 +01:00
Julian Risch
a846be99d1
Extend TranslationWrapper to work with QA Generation (#1905)
* draft translationwrapper example

* draft translation of generated qa pairs

* Add latest docstring and tutorial changes

* fixed pass by reference by deepcopy

* delete adapted tutorial 13 (test purposes only)

* adapt method signature and doc string

* Add latest docstring and tutorial changes

* add type ignore

* extend tutorial 13 with TranslationWrapper example

* Add latest docstring and tutorial changes

* removed duplicate code

* indent if statement

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>
2022-01-03 13:30:24 +01:00
tstadel
a94c274134
Support custom headers per request in pipeline (#1861)
* chain headers param down to document_stores

* Add latest docstring and tutorial changes

* fix InMemoryDocumentStore params

* Add latest docstring and tutorial changes

* fix TfidfRetriever params

* Add latest docstring and tutorial changes

* fix missing headers

* Add latest docstring and tutorial changes

* fix sparql client and update docs

* Add latest docstring and tutorial changes

* test for documentstores

* pipeline tests added

* update header param in docstrings

* Add latest docstring and tutorial changes

* refactoring: headers as implicit param

* Add latest docstring and tutorial changes

* remove unnecessary imports

* propagade batch_size correctly

* Add latest docstring and tutorial changes

* revert InMemoryDocumentStore.write_documents signature

* Add latest docstring and tutorial changes

* remove #type: ignore

* Add latest docstring and tutorial changes

* replace MutableMapping by Dict

* Add latest docstring and tutorial changes

* improve docstrings

* Add latest docstring and tutorial changes

* get rid of **kwargs

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-03 11:38:02 +01:00
el2e10
377c20b8b1
Fix grammatical issue in optimization guides (#1941) 2022-01-03 11:06:13 +01:00
bogdankostic
39573cf0a9
Add ParsrConverter (#1931)
* Add ParsrConverter

* Fix typing error + add Parsr to Linux CI

* Fix valid_language for all converters + fix context generation for ParsrConverter

* Remove ParsrConverter test from WindowsCI

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-30 10:15:11 +01:00
MichelBartels
f33c2b987a
Adding distillation loss functions from TinyBERT (#1879)
* initial tinybertdistill commit

* add tinybert distill loss

* remove teacher caching for tinybert

* add tinybert to distil_from method

* Add latest docstring and tutorial changes

* add dim mapping and fix type hints

* fix type hints

* fix dummy input

* fix dim mapping for tinybert loss and add comments/doc strings

* add test for tinybert loss

* Add latest docstring and tutorial changes

* add comment

* fix BERT forward parameters

* add doc string to AdaptiveModel forward method

* remove unnecessary data silo

* fix farm import

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-23 14:54:02 +01:00
Alberto Villa
1bb6244a63
Exchanged minimal with minimum in print_answers function call (#1890) 2021-12-14 15:27:37 +01:00
Alberto Villa
2396f0cd3a
Correct bug with encoding when generating Markdown documentation; linked with issue #1880 (#1881) 2021-12-14 10:50:25 +01:00
tstadel
57a04631df
introduce node_input param (#1854)
* introduce node_input param

* Add latest docstring and tutorial changes

* prediction and label as node_input values

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-14 10:34:35 +01:00
Branden Chan
ea5aab23ec
Update pydoc-markdown-file-classifier.yml (#1856)
* Update pydoc-markdown-file-classifier.yml

* Add latest docstring and tutorial changes

* Prevent wrapping DataParallel in second DataParallel (#1855)

* Prevent wrapping DataParallel in second DataParallel

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Create v1.0 docs (#1862)

* Update pydoc-markdown-file-classifier.yml

* Add latest docstring and tutorial changes

* Rebase and apply change to v1.0

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2021-12-08 18:19:03 +01:00
bogdankostic
cbfe2b4626
Prevent wrapping DataParallel in second DataParallel (#1855)
* Prevent wrapping DataParallel in second DataParallel

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-08 09:56:45 +01:00
Julian Risch
54f776350c
Update evaluation tutorial to cover the new pipeline.eval() (#1765)
* Replace old tutorial 5 with new code based on test cases

* Add latest docstring and tutorial changes

* Use pipeline.eval() in tutorial

* Add latest docstring and tutorial changes

* Restructure notebook

* Add latest docstring and tutorial changes

* Add dataframe example

* Add latest docstring and tutorial changes

* Get eval data from doc store

* Add latest docstring and tutorial changes

* Load data from doc store

* Add latest docstring and tutorial changes

* Clear outputs

* Add latest docstring and tutorial changes

* Change example and add python script

* Add latest docstring and tutorial changes

* Fetch aggregated multilabels from doc store

* Add latest docstring and tutorial changes

* Incorporate review feedback on text comments

* Add latest docstring and tutorial changes

* Add Notebook output

* Remove queries param from pipeline.eval()

* Add latest docstring and tutorial changes

* Add output with all metrics

* Add printing of multiple metrics to script

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-03 11:19:41 +01:00
tstadel
180c05365a
Deprecate old pipeline eval nodes: EvalDocuments and EvalAnswers (#1778)
* log deprecated warning on init

* deprecation warning included into docstrings

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-02 18:09:26 +01:00
tstadel
dc4cd49049
remove queries param from pipeline.eval() (#1836)
* remove queries param from pipeline.eval()

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-02 16:04:01 +01:00
tstadel
c5540d05ed
Calculation of metrics and presentation of eval results (#1760)
* retriever metrics added

* Add latest docstring and tutorial changes

* answer and document level matching metrics implemented

* Add latest docstring and tutorial changes

* answer related metrics for retriever

* basic reader metrics implemented

* handle no_answers

* fix typing

* fix tests

* fix tests without sas

* first draft for simulated top k

* rename sas and f1 columns in dataframe

* refactoring of EvaluationResult

* Add latest docstring and tutorial changes

* more eval tests added

* fix sas expected value precision

* distinction between ir and qa recall

* EvaluationResult.worst_queries() implemented

* print_evaluation_report() added

* eval report for QA Pipeline improved

* dynamic metrics for worst queries calc

* Add latest docstring and tutorial changes

* method names adjusted

* simple test for print_eval_report() added

* improved documentation

* Add latest docstring and tutorial changes

* minor formatting

* Add latest docstring and tutorial changes

* fix no_answer cases

* adjust one docstring

* Add latest docstring and tutorial changes

* fix no_answer cases for sas

* batchmode for sas implemented

* fix for retriever metrics if there are only no_answers

* fix multilabel tests

* improve documentation for pipeline.eval()

* streamline multilabel aggregates and docs

* Add latest docstring and tutorial changes

* fix multilabel tests

* unify document_id

* add dataframe schema description to EvaluationResult

* Add latest docstring and tutorial changes

* rename worst_queries to wrong_examples

* Add latest docstring and tutorial changes

* make query digesting standard pipelines work with pipeline.eval()

* Add latest docstring and tutorial changes

* tests for multi retriever pipelines added

* remove unnecessary import

* print_eval_report(): support all pipelines without junctions

* Add latest docstring and tutorial changes

* fix typos

* Add latest docstring and tutorial changes

* fix minor simulated_top_k bug and use memory documentstore throughout tests

* sas model param description improved

* Add latest docstring and tutorial changes

* rename recall metrics

* Add latest docstring and tutorial changes

* fix mean average precision link

* Add latest docstring and tutorial changes

* adjust sas description docstring

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-30 19:26:34 +01:00
AhmedIdr
56e4e8486f
Added max_seq_length and batch_size params to embeddingretriever (#1817)
* Added max_seq_length and batch_size params, added progress_bar to faiss writing_documents

* Add latest docstring and tutorial changes

* fixed typos

* Update dense.py

Changed default batch_size and max_seq_len in EmbeddingRetriever

* Add latest docstring and tutorial changes

* Update faiss.py

Change import tqdm.auto to tqdm

* Update faiss.py

Changing tqdm back to tqdm.auto

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-29 19:49:51 +01:00
bogdankostic
eb5f7bb4c0
Add AzureConverter to support table parsing from documents (#1813)
* Add FormRecognizerConverter

* Change signature of convert method + change return type of all converters

* Adapt preprocessing util to new return type of converters

* Parametrize number of lines used for surrounding context of table

* Change name from FormRecognizerConverter to AzureConverter

* Set version of azure-ai-formrecognizer package

* Change tutorial 8 based on new return type of converters

* Add tests

* Add latest docstring and tutorial changes

* Fix typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-29 18:44:20 +01:00
MichelBartels
84147edcca
Model Distillation (#1758)
* initial commit

* Add latest docstring and tutorial changes

* added comments and fixed bug

* fixed bugs, added benchmark and added documentation

* Add latest docstring and tutorial changes

* fix type: ignore comment

* fix logging in benchmark

* fixed distillation config

* Add latest docstring and tutorial changes

* added type annotations

* fixed distillation loss calculation

* added type annotations

* fixed distillation mse loss

* improved model distillation benchmark config loading

* added temperature for model distillation

* removed uncessary imports, added comments, added named parameter calls

* Add latest docstring and tutorial changes

* added some more comments

* added distillation test

* fixed distillation test

* removed unnecessary import

* fix softmax dimension

* add grid search

* improved model distillation benchmark config

* fixed model distillation hyperparameter search

* added doc strings and type hints for model distillation

* Add latest docstring and tutorial changes

* fixed type hints

* fixed type hints

* fixed type hints

* wrote out params instead of kwargs in DistillationDataSilo initializer

* fixed type hints

* fixed typo

* fixed typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-26 18:49:30 +01:00
Julian Risch
3b8e2e7b6c
Fix link to colab notebook in tutorial 16 (#1802)
* Fix link to colab notebook in tutorial 16

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-24 13:19:20 +01:00
Sowmiya Jaganathan
04d93ec247
Introduced an arg to add synonyms - Elasticsearch (#1625)
* Introduced an arg add synonyms to Elasticsearch

* Added the test code, removed the whitespace formatting changes, and overwrote the relevant parts from the already existing mapping instead of creating new mapping.

* Added the test code

* Remove whitespace change

* Added the doc_string with examples and link

* Removed unneccessary spaces

* Add latest docstring and tutorial changes

* fix text_field -> content_field

Co-authored-by: sowmiya-emplay <sowmiya.j@emplay.net>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-23 19:10:34 +01:00
MichelBartels
e80771f839
Adding yaml functionality to standard pipelines (save/load...) (#1735)
* adding yaml functionality to BaseStandardPipeline

fixes #1681

* Add latest docstring and tutorial changes

* Update API Reference Pages for v1.0 (#1729)

* Create new API pages and update existing ones

* Create query classifier page

* Remove Objects suffix

* Change answer aggregation key to doc_id, query instead of label_id, query (#1726)

* Add debugging example to tutorial (#1731)

* Add debugging example to tutorial

* Add latest docstring and tutorial changes

* Remove Objects suffix

* Add latest docstring and tutorial changes

* Revert "Remove Objects suffix"

This reverts commit 6681cb06510b080775994effe6a50bae42254be4.

* Revert unintentional commit

* Add third debugging option

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix another self.device/s typo (#1734)

* Fix yet another self.device(s) typo

* Add typing to 'initialize_device_settings' to try prevent future issues

* Fix bug in Tutorial5

* Fix the same bug in the notebook

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* added test for saving and loading prebuilt pipelines

* fixed typo, changed variable name and added comments

* Add latest docstring and tutorial changes

* Fix a few details of some tutorials (#1733)

* Make Tutorial10 use print instead of logs and fix a typo in Tutoria15

* Add a type check in 'print_answers'

* Add same checks to print_documents and print_questions

* Make RAGenerator return Answers instead of dictionaries

* Fix RAGenerator tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix `print_answers` (#1743)

* Fix a specific path of print_answers that was assuming answers are dictionaries

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Split pipeline tests into three suites (#1755)

* Split pipeline tests into three suites

* Will this trigger the CI?

* Rename duplicate test into test_most_similar_documents_pipeline

* Fixing a bug that was probably never noticed

* Capitalize starting letter in params (#1750)

* Capitalize starting letter in params

Capitalized the starting letter in code examples for params in keeping with the latest names for nodes where first letter is capitalized. 
Refer: https://github.com/deepset-ai/haystack/issues/1748

* Update standard_pipelines.py

Capitalized some starting letters in the docstrings in keeping with the updated node names for standard pipelines

* Multi query eval (#1746)

* add eval() to pipeline

* Add latest docstring and tutorial changes

* support multiple queries in eval()

* Add latest docstring and tutorial changes

* keep single query test

* fix EvaluationResult node_results default

* adjust docstrings

* Add latest docstring and tutorial changes

* minor improvements from comments

* Add latest docstring and tutorial changes

* move EvaluationResult and calculate_metrics to schema

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Split summarizer tests in order to make windows CI work again (#1757)

* separate testfile for summarizer with translation

* Add latest docstring and tutorial changes

* import SPLIT_DOCS from test_summarizer

* add workflow_dispatch to windows_ci

* add worflow_dispatch to linux_ci

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix import of EvaluationResult in test case

* exclude test_summarizer_translation.py for windows_ci (#1759)

* Pipelines now tolerate custom _debug content (#1756)

* Pipelines now tolerate custom _debug content

* Support Tables in all DocumentStores (#1744)

* Add support for tables in SQLDocumentStore, FAISSDocumentStore and MilvuDocumentStore

* Add support for WeaviateDocumentStore

* Make sure that embedded meta fields are strings + add embedding_dim to WeaviateDocStore in test config

* Add latest docstring and tutorial changes

* Represent tables in WeaviateDocumentStore as nested lists

* Fix mypy

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Allow TableReader models without aggregation classifier (#1772)

* Fix usage of filters in `/query` endpoint in REST API (#1774)

* WIP filter refactoring

* fix filter formatting

* remove inplace modification of filters

* Public demo (#1747)

* Queries now run only when pressing RUN. File upload hidden. Question is not sent if the textbox is empty.

* Add latest docstring and tutorial changes

* Tidy up: remove needless state, add comments, fix minor bugs

* Had to add results to the status to avoid some bugs in eval mode

* Added 'credits'

* Add footers, update requirements, some random questions for the evaluation

* Add requested changes

* Temporary rollback the UI to the old GoT dataset

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Facilitate concurrent query / indexing in Elasticsearch with dense retrievers (new `skip_missing_embeddings` param) (#1762)

* Filtering records not having embeddings

* Added support for skip_missing_embeddings Flag. Default behavior is throw error when embeddings are missing. If skip_missing_embeddings=True then documents without embeddings are ignored for vector similarity

* Fix for below error:
haystack/document_stores/elasticsearch.py:852: error: Need type annotation for "script_score_query"

* docstring for skip_missing_embeddings parameter

* Raise exception where no documents with embeddings is found for Embedding retriever.

* Default skip_missing_embeddings to True

* Explicitly check if embeddings are present if no results are returned by EmbeddingRetriever for Elasticsearch

* Added test case for based on Julian's input

* Added test case for based on Julian's input. Fix pytest error on the testcase

* Added test case for based on Julian's input. Fix pytest error on the testcase

* Added test case for based on Julian's input. Fix pytest error on the testcase

* Simplify code by using get_embed_count

* Adjust docstring & error msg slightly

* Revert error msg

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>

* Huggingface private model support via API tokens (FARMReader) (#1775)

* passed kwargs to model loading

* Pass Auth token explicitly

* add use_auth_token to get_language_model_class

* added use_auth_token parameter at FARMReader

* Add latest docstring and tutorial changes

* added docs for parameter `use_auth_token`

* Add latest docstring and tutorial changes

* adding docs link

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* private hugging face models for retrievers (#1785)

* private dpr

* Add latest docstring and tutorial changes

* added parameters to child functions

* Add latest docstring and tutorial changes

* added tableextractor

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* ignore empty filters parameter (#1783)

* ignore empty filters parameter

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* initialize doc store with doc and label index in tutorial 5 (#1730)

* initialize doc store with doc and label index

* change ipynb according to py for tutorial 5

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Small fixes to the public demo (#1781)

* Make strealit tolerant to haystack not knowing its version, and adding special error for docstore issues

* Add workaround for a Streamlit bug

* Make default filters value an empty dict

* Return more context for each answer in the rest api

* Make the hs_version call not-blocking by adding a very quick timeout

* Add disclaimer on low confidence answer

* Use the no-answer feature of the reader to highlight questions with no good answer

* Upgrade torch to v1.10.0 (#1789)

* Upgrade torch to v1.10.0

* Adapt torch version for torch-scatter in TableQA tutorial

* Add latest docstring and tutorial changes

* Make torch version more flexible

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* adding yaml functionality to BaseStandardPipeline

fixes #1681

* Add latest docstring and tutorial changes

* added test for saving and loading prebuilt pipelines

* fixed typo, changed variable name and added comments

* Add latest docstring and tutorial changes

* fix code rendering for example

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Branden Chan <33759007+brandenchan@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: nishanthcgit <5066268+nishanthcgit@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: C V Goudar <cvgoudar@users.noreply.github.com>
Co-authored-by: Kristof Herrmann <37148029+ArzelaAscoIi@users.noreply.github.com>
2021-11-23 17:01:39 +01:00
bogdankostic
c00b32cf67
Fix Tutorial 11 on Google Colab (#1795)
* Remove installation of latest release

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-23 15:35:23 +01:00
bogdankostic
a19a9f548b
Upgrade torch to v1.10.0 (#1789)
* Upgrade torch to v1.10.0

* Adapt torch version for torch-scatter in TableQA tutorial

* Add latest docstring and tutorial changes

* Make torch version more flexible

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-23 11:49:46 +01:00
Julian Risch
9211c4c64d
initialize doc store with doc and label index in tutorial 5 (#1730)
* initialize doc store with doc and label index

* change ipynb according to py for tutorial 5

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-22 15:18:02 +01:00
Kristof Herrmann
a8c2cdc565
private hugging face models for retrievers (#1785)
* private dpr

* Add latest docstring and tutorial changes

* added parameters to child functions

* Add latest docstring and tutorial changes

* added tableextractor

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-22 09:24:02 +01:00
Kristof Herrmann
8aa4ca29c2
Huggingface private model support via API tokens (FARMReader) (#1775)
* passed kwargs to model loading

* Pass Auth token explicitly

* add use_auth_token to get_language_model_class

* added use_auth_token parameter at FARMReader

* Add latest docstring and tutorial changes

* added docs for parameter `use_auth_token`

* Add latest docstring and tutorial changes

* adding docs link

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-19 16:48:31 +01:00
tstadel
956d5bba43
Split summarizer tests in order to make windows CI work again (#1757)
* separate testfile for summarizer with translation

* Add latest docstring and tutorial changes

* import SPLIT_DOCS from test_summarizer

* add workflow_dispatch to windows_ci

* add worflow_dispatch to linux_ci

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-15 18:49:49 +01:00
tstadel
59e04cba05
Multi query eval (#1746)
* add eval() to pipeline

* Add latest docstring and tutorial changes

* support multiple queries in eval()

* Add latest docstring and tutorial changes

* keep single query test

* fix EvaluationResult node_results default

* adjust docstrings

* Add latest docstring and tutorial changes

* minor improvements from comments

* Add latest docstring and tutorial changes

* move EvaluationResult and calculate_metrics to schema

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-15 14:51:11 +01:00
nishanthcgit
cf603042b2
Capitalize starting letter in params (#1750)
* Capitalize starting letter in params

Capitalized the starting letter in code examples for params in keeping with the latest names for nodes where first letter is capitalized. 
Refer: https://github.com/deepset-ai/haystack/issues/1748

* Update standard_pipelines.py

Capitalized some starting letters in the docstrings in keeping with the updated node names for standard pipelines
2021-11-15 12:38:13 +01:00
Sara Zan
09a462d756
Fix print_answers (#1743)
* Fix a specific path of print_answers that was assuming answers are dictionaries

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-15 09:50:09 +01:00
Sara Zan
ea3abd305b
Fix a few details of some tutorials (#1733)
* Make Tutorial10 use print instead of logs and fix a typo in Tutoria15

* Add a type check in 'print_answers'

* Add same checks to print_documents and print_questions

* Make RAGenerator return Answers instead of dictionaries

* Fix RAGenerator tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-12 16:44:28 +01:00
Sara Zan
85a08d671a
Fix another self.device/s typo (#1734)
* Fix yet another self.device(s) typo

* Add typing to 'initialize_device_settings' to try prevent future issues

* Fix bug in Tutorial5

* Fix the same bug in the notebook

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-11 17:18:06 +01:00
Branden Chan
8082549663
Add debugging example to tutorial (#1731)
* Add debugging example to tutorial

* Add latest docstring and tutorial changes

* Remove Objects suffix

* Add latest docstring and tutorial changes

* Revert "Remove Objects suffix"

This reverts commit 6681cb06510b080775994effe6a50bae42254be4.

* Revert unintentional commit

* Add third debugging option

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-11 14:45:06 +01:00
Branden Chan
81f82b1b95
Update API Reference Pages for v1.0 (#1729)
* Create new API pages and update existing ones

* Create query classifier page

* Remove Objects suffix
2021-11-11 12:44:29 +01:00
tstadel
158460504b
Make FAISSDocumentStore work with yaml (#1727)
* add faiss_index_path and faiss_config_path

* Add latest docstring and tutorial changes

* remove duplicate cleaning stuff

* refactoring + test for invalid param combination

* adjust type hints

* Add latest docstring and tutorial changes

* add documentation to @preload_index

* Add latest docstring and tutorial changes

* recursive __init__ instead of decorator

* Add latest docstring and tutorial changes

* validate instead of check

* combine ifs

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-11 11:02:22 +01:00