1461 Commits

Author SHA1 Message Date
Sara Zan
3539e6b041
Fix circular import in the REST API (#1556)
* Fix circular import in the REST API

* remove unneeded import in test

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-04 21:18:23 +02:00
Sara Zan
af4a44fcbd
WIP Add rest api endpoint to delete documents by filter (#1546)
* Add rest api endpoint to delete documents by filter.

* Remove parametrization of rest api tests

* Make the paths in rest_api/config.py absolute

* Fix path to pipelines.yaml

* Restructuring test_rest_api.py to be able to test only my endpoint (and to make the suite more structured)

* Convert DELETE /documents into POST /documents/delete_by_filters

Co-authored by:  sarthakj2109 <54064348+sarthakj2109@users.noreply.github.com>
2021-10-04 11:21:00 +02:00
Julian Risch
7e063b77d2
Format doc classifier usage example (#1550)
* Format doc classifier usage example

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-01 15:01:19 +02:00
Julian Risch
24483d7bad
TransformersDocumentClassifier replacing FARMClassifier (#1540)
* Initial draft of TransformersClassifier

* Add transformers classifier implementation

* Add test for SentenceTransformersClassifier

* Add truncation and corresponding test case to Classifier

* Add zero-shot classification and test

* Add document classifier documentation

* Add latest docstring and tutorial changes

* print meta data with print_documents()

* Add latest docstring and tutorial changes

* Remove top_k param from Classifier usage example

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-01 11:22:56 +02:00
bogdankostic
a20eec3098
Remove double mentions from requirements (#1545)
* Remove one mention of sentence-transformers from requirements

* Remove one mention of sklearn from requirements
2021-09-30 16:21:24 +02:00
Julian Risch
9ed726923c
Remove NER and text classification from model conversion (#1536) 2021-09-29 13:35:59 +02:00
Julian Risch
0e7338f0c6
Remove mentions of FARM from Ranker comments (#1535)
* Remove mentions of FARM from Ranker comments

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-29 11:57:30 +02:00
Sara Zan
a30a826c6c
Standardize delete_documents(filter=...) across all document stores (#1509)
* Make InMemoryDocumentStore accept and apply filters in delete_documents()

* Modify test_document_store.py to test the filtered deletion in memory, sql and milvus too

* Make FAISSDocumentStore accept and properly apply filters in delete_documents()

* Add latest docstring and tutorial changes

* Remove accidentally duplicated test

* Remove unnecessary decorators from test/test_document_store.py::test_delete_documents_with_filters

* Add embeddings count test for FAISS and Milvus; Milvus fails it.

* Fixed a bug that made Milvus not deleting embeddings

* Remove batch size parametrization in tests & update all documentstore's docstrings with a filter example

* Add latest docstring and tutorial changes

Co-authored-by: prafgup <prafulgupta6@gmail.com>
2021-09-29 09:27:06 +02:00
Malte Pietsch
39d324ed17
Fix typo 2021-09-28 16:39:58 +02:00
Malte Pietsch
2df1aa8713
Fix document_store_type flag for tests with multiple fixtures that get parametrized. (#1526) 2021-09-28 16:38:21 +02:00
Julian Risch
f9d2f786ca
Replace FARM import statements; add dependencies (#1492)
* Replace FARM import statements; add dependencies

* Add InferenceProc., TextCl.Proc., TextPairCl.Proc.

* Remove FARMRanker, add type annotations, rename max_sample

* Add sample_to_features_text for InferenceProc.

* Fix type annotations: model_name_or_path is str not Path

* Fix mypy errors: implement _create_dataset in TextCl.Proc.

* Add task_type "embeddings" in Inferencer

* Allow loading AdaptiveModel for embedding task

* Add SQuAD eval metrics; enable InferenceProc for embedding task

* Add baskets as param to log_samples and handle empty basket list in log_samples

* Remove unused dependencies

* Remove FARMClassifier (doc classificer) due to ref to TextClassificationHead

* Remove FARMRanker and Classifier from doc generation scripts

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-28 16:34:24 +02:00
Sara Zan
2de5385ac2
Add "API is loading" message in the UI (#1493)
* Create the /initialized endpoint

* Now showing an error message if the connection fails, and a 'Haystack is loading' message while workers are starting up

* Improve the appearance of the various messages

* Newline at the end of file
2021-09-27 16:40:25 +02:00
Sara Zan
1cd17022af
Fix bug when loading FAISS from supplied config file path (#1506)
* Fix the bug found in issue 135

* Add a test for the custom path
2021-09-27 11:25:05 +02:00
Malte Pietsch
183fd5ae5a
Simplify tests & allow running on individual doc stores (#1487)
* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* update readme and contributing.md

* update contributing

* adjust example

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-27 10:52:07 +02:00
Markus Paff
d3fd888a76
Release Docs 0.10.0 (#1460)
* updated tutorials and docstrings and new version

* update to correct directory structure
2021-09-23 16:22:14 +02:00
bogdankostic
6118d202e1
Add newline between paragraphs in DocxToTextConverter (#1500) 2021-09-23 15:45:31 +02:00
Lalit Pagaria
98f7610d89
Download archive from url without temp file (#1470) 2021-09-23 15:35:56 +02:00
bogdankostic
c644e2b4d0
Add comment to tutorial notebooks about restarting runtime in colab (#1486)
* Add comment to tutorial notebooks about restarting runtime in colab

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-23 14:36:20 +02:00
Sara Zan
34a8879f60
Remove 'restart=always' from 'haystack-api' in both docker-compose files (#1498)
* Remove 'restart=always' from 'haystack-api'

* Remove 'restart=always' from docker-compose-gpu.yml as well
2021-09-23 11:38:08 +02:00
Julian Risch
60471cecdf
Add inferencer for QA only (#1484)
* Add inferencer for QA only

* Add latest docstring and tutorial changes

* Add QA inferencer tests

* Add type annotations for inferencer

* Fix type annotations, move util functions

* Fix type annotations

* Move fixtures to the top of the file

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-22 16:56:51 +02:00
Julian Risch
d569e66bc7
Update Tutorial1_Basic_QA_Pipeline.ipynb (#1489)
* Update Tutorial1_Basic_QA_Pipeline.ipynb

passing params to pipeline as dict

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-22 16:35:20 +02:00
Malte Pietsch
ff1adb64c2
Update README.md 2021-09-21 17:56:40 +02:00
Branden Chan
bddee2def4
Define SAS model in notebook (#1485)
* Define SAS model in notebook

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-21 17:05:16 +02:00
Branden Chan
2c4baa7f4e
Regenerate API and Tutorial md files (#1480)
* Change punctuation

* Add latest docstring and tutorial changes

* Change punctuation

* Add documentation for Docs2Answer

* Add latest docstring and tutorial changes

* Generate new API docs

* Replace Finder with Pipeline

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-21 14:42:18 +02:00
ju-gu
05da7f71dd
changed delete_all_documents to delete_documents (#1477) 2021-09-20 14:29:33 +02:00
Markus Paff
e0ad6b64bf fixed workflow conflict with introducing new one (#1472) 2021-09-20 12:21:20 +02:00
Malte Pietsch
ab7d5853f2
Bump Version 2021-09-20 08:40:38 +02:00
Sara Zan
21513532e5
Improve save/load of FAISS document store by saving its configuration alongside the index (#1459)
* Saves the FAISSDocumentStore init params to JSON at save() and loads them at load() if they're found. First draft, to be tested.

* Fixing issue with string/Path objects in a few string operations, thanks mypy

* Leverage self.set_config instead of saving the parameters in a separate attribute

* Modify test_faiss_and_milvus:test_faiss_index_save_and_load to test that init params are preserved

* Add assert to verify that the SQL doc count and FAISS vector count is equal. Needs to always specify the name of the SQL db for this to work

* Simplified the implementation a bit, add better comments

* Forgot a return at the end of the file

* Fixing some of the suggestions from the review

* Add a try-catch in the load method and fix the tests

* Typo
2021-09-20 08:32:14 +02:00
mathislucka
9c4e67d9b6
Enable cosine similarity metric in FAISSDocumentStore (#1352)
* feat: normalize embeddings for cosine sim

* WIP add test case for faiss cosine

* input to faiss normalize needs to be an array of vectors

* fix: test should compare correct result embedding to original embedding

* add sanity check for cosine sim

* fix typo

* normalize cosine score

* Update docstring

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-09-20 07:54:26 +02:00
Markus Paff
5b1b875374
fixed workflow conflict with introducing new one (#1472) 2021-09-17 23:44:45 +02:00
Markus Paff
39845c0624
Automate updates docstrings tutorials (#1461)
* remove not needed githab actions and reactivate docstrings and tutorial generation

* test workflow

* update pydoc version

* update python version

* update watchdog

* move to latest version pydoc-markdown

* remove version check

* Add latest docstring and tutorial changes

* remove test workflow

* test for param docstrings

* pin pydoc-markdown version

* add test workflow

* pin watchdog version

* Add latest docstring and tutorial changes

* update original workflow and delete test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-17 13:44:31 +02:00
Timo Moeller
172de1c05f
Merge pull request #1422 from deepset-ai/farm_merging_base
Farm merging base
2021-09-16 11:32:41 +02:00
Malte Pietsch
30dc010171 Bump Haystack version to 0.10.0 v0.10.0 2021-09-16 06:49:44 +02:00
oryx1729
9a1e3fec86
Update DocumentStore env in docker-compose (#1450) 2021-09-14 12:28:30 +02:00
Timo Moeller
d804861fb2 Fix tests 2021-09-13 20:00:22 +02:00
Timo Moeller
ba7178be7f satisfy mypy 2021-09-13 19:29:20 +02:00
Timo Moeller
537204e8c9
Fix tests and adjust folder structure
* Add type annotations in QuestionAnsweringHead

* Fix test by increasing max_seq_len

* Add SampleBasket type annotation

* Remove prediction head param from adaptive model init

* Add type ignore for AdaptiveModel init

* Fix and rename tests

* Adjust folder structure

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2021-09-13 18:38:14 +02:00
Priyam Mehta
389f6b68fb
Added functionality for Google Colab usecase in Crawler Module (#1436)
* Added functionality for Google Colab usecase

* Corrected typo in installation guide of driver

* Corrected typo in installation guide of driver

* Corrected the copy command
2021-09-13 14:58:36 +02:00
Malte Pietsch
b53ad7af53
quality of life function to access certain nodes in pipeline (#1441) 2021-09-13 13:03:38 +02:00
Ikram Ali
f186d6327d
Add MostSimilarDocumentsPipeline (#1413)
* [pipeline] MostSimilarDocumentsPipeline added

* [pipeline] mypy bug fixed.

* [pipeline] mypy bug fixed.

* [pipeline] test cases added.

* [pipeline] test cases added.

* [pipeline] set return_embedding back to false.

* [pipeline] return a list of Documents

* [pipeline] define the ids

* [pipeline] code refactor.

* [pipeline] code refactor.

* [pipeline] test case improved.

* Update docstring

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-09-13 12:43:45 +02:00
oryx1729
3deff26b60
Fix Search REST API when filters are None (#1431) 2021-09-10 14:47:34 +02:00
MichelBartels
da2e8da561
Adding multi gpu support for DPR inference (#1414)
* Added support for Multi-GPU inference to DPR including benchmark

* fixed multi gpu

* added batch size to benchmark to better reflect multi gpu capabilities

* remove unnecessary entry in config.json

* fixed typos

* fixed config name

* update benchmark to use DEVICES constant

* changed multi gpu parameters and updated docstring

* adds silent fallback on cpu

* update doc string, warning and config

Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-09-10 13:25:02 +02:00
oryx1729
1f859694f1
Add support for Dense Retrievers in REST API Indexing Pipeline (#1430) 2021-09-10 11:53:32 +02:00
oryx1729
9dd7c74f4f
Refactor communication between Pipeline Components (#1321) 2021-09-10 11:41:16 +02:00
Timo Moeller
e8a6427b9e Remove farm mentions from code and docs, reformat code 2021-09-09 15:48:11 +02:00
Julian Risch
4a64c50c7e Merge branch 'farm_merging_base' of github.com:deepset-ai/haystack into farm_merging_base 2021-09-09 13:03:38 +02:00
Julian Risch
ba1fe0ec61 Add fixture distilbert_squad 2021-09-09 13:02:35 +02:00
bogdankostic
2626388961
Fix DPR tests + add Tokenizer tests (#1429)
* Fix DPR tests

* Add Tokenizer tests
2021-09-09 12:56:44 +02:00
oryx1729
3e6def7e03
Add type ignore to resolve mypy errors (#1427) 2021-09-09 12:29:01 +02:00
Julian Risch
23338f1b74 Add tests: prediction head, processor load/save, qa from FARM 2021-09-09 11:54:47 +02:00