70 Commits

Author SHA1 Message Date
Lalit Pagaria
63c12371b9
Change arg "model" to "model_name_or_path" in TransformersReader (#510)
* Consistent parameter naming for TransformersReader along with removing unused imports as well.

* Addressing review comments
2020-10-21 17:15:35 +02:00
Lalit Pagaria
2e9f3c1512
Fix update_embeddings function in FAISSDocumentStore and add retriever fixture in tests (#481)
* 1. Prevent update_embeddings function in FAISSDocumentStore to set faiss_index as None when document store does not have any docs.

2. cleaning up tests by adding fixture for retriever.

* TfidfRetriever need document store with documents during initialization as it call fit() function in constructor so fixing it by checking self.paragraphs of None

* Fix naming of retriever's fixture (embedded to embedding and tfid to tfidf)
2020-10-14 16:15:04 +02:00
Malte Pietsch
8edeb844f7
Remove phi normalization from FAISS, support more index types, 3x speedup (#467)
* remove phi normalization

* add special case for hnsw

* rename vector_size to vector_dim

* fix loading. fix extra dim in tests

* switch to new ES syntax for vector similarity

* 3x sql speed up. cascade deletes. add train_index()

* add docstrings. remove vector_dim from load()

* delete docs from faiss and sql

* fix delete of docs in test

* relax type hint for faiss index

* rename metric to metric_type

Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com>
2020-10-06 16:09:56 +02:00
Malte Pietsch
9727829cc6
Rename and restructure modules (database, indexing, schemas) (#379)
* rename database to documentstore

* move document, label, multilabel to haystack/schema.py

* rename documentstore -> document_store

* split indexing modules -> file_converter + preprocessor

* fix order of imports

* Update tutorial notebooks

* fix torch version in tutorial 4
2020-09-16 18:33:23 +02:00
Lalit P
de5ad42e46
Adjust tests for MacOS (#374) 2020-09-15 15:04:46 +02:00
Tanay Soni
01ff66dfd6 Remove redundant test fixture 2020-08-17 14:19:38 +02:00
Dany
403318b1f5 Add Tika Converter (#314) 2020-08-17 11:21:09 +02:00
Tanay Soni
1637ce1184 Revert "Add Tika Converter (#314)"
This reverts commit 5ef59b1901da6d51bfa085683321a243228d4fc9.
2020-08-17 11:13:52 +02:00
Tanay Soni
5ef59b1901
Add Tika Converter (#314) 2020-08-14 14:13:59 +02:00
Tanay Soni
9d0df60aad
Add FAISS Document Store (#253) 2020-08-07 14:25:08 +02:00
Timo Moeller
d9e8b522a1
Add "no answer" aggregation to Transformersreader (#259)
* Add no answer aggregation

* Change to covariant type annotation

* Remove n_best_per_passage from transformersreader
2020-08-06 17:32:55 +02:00
Tanay Soni
5937f9cf16
Deprecate Tags for Document Stores (#286) 2020-08-04 14:24:12 +02:00
Malte Pietsch
29a15c0d59
Add eval for Dense Passage Retriever & Refactor handling of labels/feedback (#243) 2020-07-31 11:34:06 +02:00
Malte Pietsch
99a6a34047
Upgrade to new FARM / Transformers / PyTorch versions (#212) 2020-07-14 18:53:15 +02:00
Tanay Soni
b886e054a3
Move document_name attribute to meta (#217) 2020-07-14 09:53:31 +02:00
Malte Pietsch
d2b26a99ff
Add more tests (#213) 2020-07-10 10:54:56 +02:00
Tanay Soni
180dc8cbd6
Start Elasticsearch with a Github Action (#142) 2020-06-09 12:46:15 +02:00
Tanay Soni
160345f3d5 Update build workflow 2020-06-09 11:45:25 +02:00
Tanay Soni
ef9e4f4467
Add PDF text extraction (#109) 2020-06-08 11:07:19 +02:00
Tanay Soni
37e0ff70f7
Add test for Elasticsearch document store (#88) 2020-05-04 18:00:07 +02:00