976 Commits

Author SHA1 Message Date
Timo Moeller
ba7178be7f satisfy mypy 2021-09-13 19:29:20 +02:00
Timo Moeller
537204e8c9
Fix tests and adjust folder structure
* Add type annotations in QuestionAnsweringHead

* Fix test by increasing max_seq_len

* Add SampleBasket type annotation

* Remove prediction head param from adaptive model init

* Add type ignore for AdaptiveModel init

* Fix and rename tests

* Adjust folder structure

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2021-09-13 18:38:14 +02:00
Priyam Mehta
389f6b68fb
Added functionality for Google Colab usecase in Crawler Module (#1436)
* Added functionality for Google Colab usecase

* Corrected typo in installation guide of driver

* Corrected typo in installation guide of driver

* Corrected the copy command
2021-09-13 14:58:36 +02:00
Malte Pietsch
b53ad7af53
quality of life function to access certain nodes in pipeline (#1441) 2021-09-13 13:03:38 +02:00
Ikram Ali
f186d6327d
Add MostSimilarDocumentsPipeline (#1413)
* [pipeline] MostSimilarDocumentsPipeline added

* [pipeline] mypy bug fixed.

* [pipeline] mypy bug fixed.

* [pipeline] test cases added.

* [pipeline] test cases added.

* [pipeline] set return_embedding back to false.

* [pipeline] return a list of Documents

* [pipeline] define the ids

* [pipeline] code refactor.

* [pipeline] code refactor.

* [pipeline] test case improved.

* Update docstring

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-09-13 12:43:45 +02:00
oryx1729
3deff26b60
Fix Search REST API when filters are None (#1431) 2021-09-10 14:47:34 +02:00
MichelBartels
da2e8da561
Adding multi gpu support for DPR inference (#1414)
* Added support for Multi-GPU inference to DPR including benchmark

* fixed multi gpu

* added batch size to benchmark to better reflect multi gpu capabilities

* remove unnecessary entry in config.json

* fixed typos

* fixed config name

* update benchmark to use DEVICES constant

* changed multi gpu parameters and updated docstring

* adds silent fallback on cpu

* update doc string, warning and config

Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-09-10 13:25:02 +02:00
oryx1729
1f859694f1
Add support for Dense Retrievers in REST API Indexing Pipeline (#1430) 2021-09-10 11:53:32 +02:00
oryx1729
9dd7c74f4f
Refactor communication between Pipeline Components (#1321) 2021-09-10 11:41:16 +02:00
Timo Moeller
e8a6427b9e Remove farm mentions from code and docs, reformat code 2021-09-09 15:48:11 +02:00
Julian Risch
4a64c50c7e Merge branch 'farm_merging_base' of github.com:deepset-ai/haystack into farm_merging_base 2021-09-09 13:03:38 +02:00
Julian Risch
ba1fe0ec61 Add fixture distilbert_squad 2021-09-09 13:02:35 +02:00
bogdankostic
2626388961
Fix DPR tests + add Tokenizer tests (#1429)
* Fix DPR tests

* Add Tokenizer tests
2021-09-09 12:56:44 +02:00
oryx1729
3e6def7e03
Add type ignore to resolve mypy errors (#1427) 2021-09-09 12:29:01 +02:00
Julian Risch
23338f1b74 Add tests: prediction head, processor load/save, qa from FARM 2021-09-09 11:54:47 +02:00
bogdankostic
9c409e0012 Remove StreamingDataSilo and fix mypy errors from FARM (#1426)
* Add AdaptiveModel

* Add BiAdaptiveModel

* Add DataSilo

* Remove StreamingDataSilo

* Fix mypy errors
2021-09-09 10:12:35 +02:00
dependabot[bot]
a92f1860f6
Bump pillow from 8.2.0 to 8.3.2 (#1423)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 8.2.0 to 8.3.2.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/8.2.0...8.3.2)

---
updated-dependencies:
- dependency-name: pillow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-08 17:51:18 +02:00
Timo Moeller
b4fd08a296
Add testdata, add tests for qa processor, add dpr tests (some failing) 2021-09-08 12:02:08 +02:00
Timo Moeller
024b9e0bf8 Merge previous solutions: fix imports, add needed helper functions or remove unused ones 2021-09-08 11:51:41 +02:00
Timo Moeller
a945b43a57
Farm merging base bogdan (#1424)
* Add AdaptiveModel

* Add BiAdaptiveModel

* Add DataSilo

Co-authored-by: Bogdan Kostić <bogdankostic@web.de>
2021-09-08 10:38:28 +02:00
Timo Moeller
c5999c3c8f Add LMand tokenization 2021-09-07 13:37:36 +02:00
Julian Risch
55a8031aeb
Adding prediction head, trainer, evaluator from FARM (#1419) 2021-09-07 13:33:17 +02:00
Timo Moeller
5bc5665c0b Add processor and processing related scripts 2021-09-07 12:33:33 +02:00
Bob van Luijt
c0cc8bc80f
Bump Weaviate version to 1.7.0 (#1412)
* Bump Weaviate

* Bump Weaviate

* Bump Weaviate client

* Bump Weaviate

* Revert client version

There is a change in the client API that needs to be addressed before bumping its version
2021-09-05 09:28:55 +02:00
Malte Pietsch
f3e7074c13
Remove stale bot 2021-09-03 17:39:24 +02:00
Malte Pietsch
f3d1df1664
Enable docker-compose for GPUs & Add public UI image (#1406)
* add docker-compose-gpu file

* Update README.md

* Update docker-compose.yml

* Update docker-compose-gpu.yml

* Update docker-compose.yml

* Update docker-compose-gpu.yml
2021-09-02 17:39:21 +02:00
Malte Pietsch
bb9ec90d3c
Fix tesseract installation in Dockerfile (#1405)
* Fix Dockerfile

* Update Dockerfile-GPU
2021-09-02 11:09:30 +02:00
bogdankostic
38128c6734
Ensure num_hard_negatives is 0 when embedding passages (#1402) 2021-09-02 10:46:02 +02:00
Julian Risch
b552bf9b4d
Add sentence-transformers as mandatory dependency and remove from dev… (#1387)
* Add sentence-transformers as mandatory dependency and remove from dev dependency

* Pin sentence-transformers version
2021-09-02 09:54:13 +02:00
Branden Chan
980d88a0f2
Update faq model (#1401) 2021-09-01 18:39:06 +02:00
Malte Pietsch
e4c3c3d423
Fix CI (introduced by OCR PR #1349) (#1399)
* satisfy mypy

* add import
2021-09-01 17:16:05 +02:00
Malte Pietsch
6093bf9ff6
Fix Github action 2021-09-01 16:50:29 +02:00
Shahrukh Khan
4822536886
Add ImageToTextConverter and PDFToTextOCRConverter that utilize OCR (#1349)
* add image.py converter

* add PDFtoImageConverter

* add init to PDFtoImageConverter and classes to __init__

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* revert change in base.py in file_conv

* Update base.py

* Update pdf.py

* add ocr file_converter testcase & update dockerfile

* fix tesseract exception message typo

* fix _image_to_text doctstring

* add tesseract installation to CI

* add tesseract installation to CI

* add content test for PDF OCR converter

* update PDFToTextOCRConverter constructor doctsring

* replace image files with tmp paths for image.py convert

* replace image files with tmp paths for image.py convert

* Update README.md

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-09-01 16:42:25 +02:00
oryx1729
1d2252e96d
docker-compose always pull REST API Image (#1385) 2021-09-01 16:28:25 +02:00
Ikram Ali
3fc7f3f695
[docs] crawler api docs updated. (#1388) 2021-09-01 12:07:32 +02:00
Branden Chan
4021eb838e
Add weaviate to init (#1379) 2021-08-31 15:23:06 +02:00
Branden Chan
1938fb001b
Add support for no Docker envs in Tutorial 13 (#1365)
* Add support for no docker envs e.g. colab

* Generate md
2021-08-31 15:22:51 +02:00
oryx1729
a71180a2ca
Refactor replicas config for Ray Pipelines (#1378) 2021-08-31 10:14:55 +02:00
Ikram Ali
da5ed43734
Catch Elastic's search_phase_execution and raise with descriptive message. (#1371)
* [document_store] Catch Elastic's search_phase_execution_exception (dense retrieval if not all documents have an embedding) closes #1135

* change error msg

* remove unused import

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-08-30 19:38:07 +02:00
Jeff Hammerbacher
1c8a03aaa2
Rag tutorial fixes (#1375)
* Update Tutorial7_RAG_Generator.ipynb

`delete_all_documents` --> `delete_documents` (cf. #1045)

* Update Tutorial7_RAG_Generator.py

`delete_all_documents` --> `delete_documents` (cf. #1045)
2021-08-30 15:27:18 +02:00
cambiumproject
4ca97dd5be
Fix behavior of delete_documents() with filters for Milvus (#1354)
* Fix behavior of delete_documents()

Delete filtered set of vectors rather than the whole collection

* Update milvus.py

* Update milvus.py
2021-08-30 15:22:53 +02:00
ramgarg102
51f0a56e5d
delete_all_documents() replaced by delete_documents() (#1377)
* [UPDT] delete_all_documents() replaced by delete_documents()

* [UPDT] warning logs to be fixed

* [UPDT] delete_all_documents() renamed and the same method added

Co-authored-by: Ram Garg <ramgarg102@gmai.com>
2021-08-30 15:18:28 +02:00
Markus Paff
be8d305190
Editing docs read.me for new docs website workflow (#1372)
* editing docs read.me for new docs website workflow

* added new links to docs
2021-08-30 14:59:40 +02:00
Shahrukh Khan
c3d8aa0643
Add query classifier usage docs (#1348)
* Create query_classifier.md

* Update query_classifier.md

* Update query_classifier.md

* Update query_classifier.md

* Update query_classifier.md

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-08-24 15:56:11 +02:00
Ikram Ali
ead96730d3
Add Crawler support for indexing pipeline (#1360) 2021-08-24 14:25:22 +02:00
Markus Paff
cac15310bd
adding tutorial 13 and 14 (#1364) 2021-08-23 11:37:06 +02:00
Malte Pietsch
2a226daac4
Add simple docs2answer node to allow FAQ style QA / Doc search in API (#1361)
* minimal docs2answer node

* enable logs again
2021-08-20 17:01:55 +02:00
Markus Paff
ff2049cd45
updated tutorials (#1359) 2021-08-19 21:16:56 +02:00
annagruendler
a3c746abf5
Update test documentation in readme (#1355) 2021-08-19 10:36:21 +02:00
Ikram Ali
ef27f0d386
Add tests for Crawler (#1339) 2021-08-18 14:05:44 +02:00