3803 Commits

Author SHA1 Message Date
Malte Pietsch
50815421b0 bump haystack version v0.7.0 2021-01-21 16:02:33 +01:00
Tanay Soni
337376c81d Add batch_size and generators to document stores. (#733)
* Add batch update of embeddings in document stores

* Resolve merge conflict

* Remove document ordering dependency in tests

* Adjust index buffer size for tests

* Adjust ES Scroll Slice

* Use generator for document store pagination

* Add pagination for InMemoryDocumentStore

* Fix missing index parameter in FAISS update_embeddings()

* Fix FAISS update_embeddings()

* Update FAISS tests

* Update eval tests

* Revert code formatting change

* Fix document count in FAISS update embeddings

* Fix vector_ids reset in SQLDocumentStore

* Update doctrings

* Update docstring
2021-01-21 16:00:08 +01:00
Markus Paff
0b583b8972
Generate docstrings and deploy to branches to Staging (Website) (#731)
* test pre commit hook

* test status

* test on this branch

* push generated docstrings and tutorials to branch

* fixed syntax error

* Add latest docstring and tutorial changes

* add files before commit

* catch commit error

* separate generation from deployment

* add deployment process for staging

* add current branch to payload

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-01-21 11:01:09 +01:00
Markus Paff
0f62e0b2ee
Script for releasing docs (#736)
* script for releasing docs

* fix formatting
2021-01-21 10:58:54 +01:00
Timo Moeller
7522d2d1b0
Increase FARM to Version 0.6.2 (#755)
* Increase farm version

* Fix test
2021-01-21 10:15:41 +01:00
Branden Chan
725c03220f
Reduce memory consumption of fetch_archive_from_http (#737) 2021-01-21 09:57:55 +01:00
Timo Moeller
4803da009a
Using PreProcessor functions on eval data (#751)
* Add eval data splitting

* Adjust for split by passage, add test and test data, adjust docstrings, add max_docs to highler level fct
2021-01-20 14:40:10 +01:00
Tanay Soni
aa8a3666c3
Support filters for DensePassageRetriever + InMemoryDocumentStore (#754) 2021-01-20 12:52:52 +01:00
Rob192
35dcf23a4b
Use Path class in add_eval_data of haystack.document_store.base.py (#745)
* use Path class in method add_eval_data of haystack.document_store.base.py

* change type of jsonl_filename as squad_json_to_jsonl and add_eval_data are expecting string type
2021-01-19 12:08:49 +01:00
Andrey A
7a0b65a079
Add links to slack, twitter etc (#746)
* Update README.md
2021-01-19 11:30:26 +01:00
Branden Chan
8d47a71b00
Fix Tutorial 9 (#734)
* Add package download

* Change dev to train file
2021-01-14 10:56:58 +01:00
Julian Risch
3331608e03
Adding a guard that prevents the tutorial code from being executed in every subprocess when using multiprocessing on windows (#729) 2021-01-13 18:17:54 +01:00
Branden Chan
a3a12bc95b Remove broken link 2021-01-13 17:32:10 +01:00
brandenchan
01fd9940d8 Fix tutorial link 2021-01-13 15:29:25 +01:00
Branden Chan
7376185b65
Create DPR training tutorial (#708)
* WIP: Start DPR training tutorial

* Create basics of DPR Train tutorial

* Update documentation

* Allow DPR to be initialized without document store

* WIP: Add param descriptions to DPR notebook

* Clean tutorial

* Improve loading

* Make doc store optional when loading DPR

* Satisfy mypy type check

* Add links

* Add tutorial header

* Add colab badge

* Clear outputs

* Incorporate reviewer feedback

* WIP: Start DPR training tutorial

* Create basics of DPR Train tutorial

* Update documentation

* Allow DPR to be initialized without document store

* WIP: Add param descriptions to DPR notebook

* Clean tutorial

* Improve loading

* Make doc store optional when loading DPR

* Satisfy mypy type check

* Add links

* Add tutorial header

* Add colab badge

* Clear outputs

* Incorporate reviewer feedback

* Add readme links

* Regenerate tutorials

* Add excitement

* Fix typo

* Fix hard negatives comment

* Wrap tutorial for windows users

* Fix mypy issue
2021-01-13 10:33:55 +01:00
bogdankostic
7709b6cee0
Make batchwise adding of evaluation data possible (#717)
* Make batchwise adding of evaluation data possible

* Fix typos in docstrings

* Merge add_eval_data and add_eval_data_batchwise

* Improve import statements

* Move add_eval_data to BaseDocumentStore

* Add batch_size param to write_documents and write_labels in EsDocStore

* Adjust docstring

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-01-12 17:54:43 +01:00
Antonio Lanza
1f00599d2e
Change signature and docstring for ca_certs parameter for SSL connection (#730) 2021-01-12 17:30:09 +01:00
Malte Pietsch
e9b5439b00
Rename label id field for elastic & add UPDATE_EXISTING_DOCUMENTS to API config (#728)
* rename label id field for elastic

* add UPDATE_EXISTING_DOCUMENTS param to API config
2021-01-12 13:00:56 +01:00
Malte Pietsch
b6e64ca42d
Add ID to label schema (#727) 2021-01-12 10:02:40 +01:00
Markus Paff
3af3ee1a12
Automate docstring and tutorial generation with every push to master (#718)
* automate docstring and tutorial generation with every push to master

* test CI for current branch

* fixed yaml syntax

* add setupttools to install process

* checkout repo

* fixed command for shell script

* install wheel as it is needed for CI

* install mkdocs

* test without shell script

* use package from github actions

* test other configuration

* back to right config

* cleaning script
2021-01-11 16:25:43 +01:00
Tanay Soni
281f9ff970
Fix SQLite errors in tests (#723) 2021-01-11 13:24:38 +01:00
Malte Pietsch
fcc052b554
Pass custom label index name in api config (#724) 2021-01-11 12:24:09 +01:00
Lalit Pagaria
88b5cbe736
Correcting pypi download badge (#722) 2021-01-10 06:26:17 +01:00
Lalit Pagaria
75d0ebd076
Add Summarizer (standalone + node in custom pipelines + SearchSummarizationPipeline) (#698)
* Integration of SummarizationQAPipeline with Haystack.

* Moving summarizer tests because of OOM issue

* Fixing typo

* Splitting summarizer test in separate ci step

* Removing sysctl configuration as we already running elastic search in docker container

* fixing mypy issue

* update parameter names and docstrings

* update parameter names in BaseSummarizer

* rename pipeline

* change return type of summarizer from answer to document

* change scope of doc store fixture

* revert scope

* temp. disable test_faiss_index_save_and_load()

* fix mypy. change order for mypy in CI

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-01-08 14:29:46 +01:00
Lalit Pagaria
3a9a756810
Using Columns names instead of ORM to get all documents (#620)
* Using Columns name instead of ORM object for get all documents call

* Separating meta search from documents. This way it will optimize the memory not duplicating document.text

* Fixing mypy issue

* SQLite have limit on number of host variable hence using batching to fetch meta information

* Query meta only if meta field is not Null in DocOrm

* Add batch_size to other functions except label

* meta can be none so fix that issue

* Dummy commit to trigger CI

* Using chunked dictionary

* Upgrading faiss

* reverting change related to  faiss upgrade

* Changing DB name in test_faiss_retrieving test as it might interfere with exiting files by corrupting DB file

* Updating doc string related to batch_size

* Update docstring for batch_size

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-01-06 15:56:19 +01:00
Branden Chan
bb8aba18e0
Create Preprocessing Tutorial (#706)
* WIP: First version of preprocessing tutorial

* stride renamed overlap, ipynb and py files created

* rename split_stride in test

* Update preprocessor api documentation

* define order for markdown files

* define order of modules in api docs

* Add colab links

* Incorporate review feedback

Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2021-01-06 15:54:05 +01:00
Malte Pietsch
5db73d4107
Update stale bot 2021-01-05 08:29:24 +01:00
Malte Pietsch
74b0868d28
Fix GPU docker build (#703) 2020-12-31 15:04:13 +01:00
Malte Pietsch
a284af3ae5
Remove sourcerer.io widget (#702)
Fix #699
2020-12-30 09:57:02 +01:00
Tanmay Laud
7cd9e09491
Add basic demo UI via streamlit (#671)
* Added starter code for frontend demo

* worked on comments

* Added Docker config for frontend

* update docker file. restructure folder structure. minimal renamings and defaults

* add screenshot to readme

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-12-27 13:36:09 +01:00
Lalit Pagaria
fc521fe293
Haystack logo is not visible on github mobile. Adding download badge. (#697) 2020-12-27 12:58:48 +01:00
Malte Pietsch
737a47e2fe
Update readme 2020-12-26 12:14:52 +01:00
Malte Pietsch
a2e5e6b09e
Update pipeline documentation and readme (#693)
* Update README.md

* Update pipelines.md

* Update pipelines.md

* Update README.md
2020-12-22 13:34:28 +01:00
Malte Pietsch
94b7345505
Make use_gpu=True the default in tutorials (#692)
* enable gpu args in tutorials

* add info box for gpu runtime on colab
2020-12-22 07:58:12 +01:00
Markus Paff
b752da1cd5
Add docs v0.6.0 (#689)
* new docs version

* updated directory structure

* Add pipelines page

* Add Finder deprecation suggestion

* header for pipelines file

* Document MySQL support

* Mention DPR train tutorial coming soon

* Mention open distro ES

* Update doc strings regarding similarity fn

* Add link to API docs

* Wrap pipelines docs in box

* add api reference for pipelines

* copied latest version to v0.6.0

* Remove space

* Remove space

* Copy to v0.6.0

Co-authored-by: brandenchan <brandenchan@icloud.com>
2020-12-18 12:47:27 +01:00
Tanay Soni
0e4eec9499
Add tests for custom embedding field (#640) 2020-12-17 09:18:57 +01:00
Malte Pietsch
5b817387c2 Bump version to 0.6.0 v0.6.0 2020-12-17 06:31:22 +01:00
bogdankostic
a9bcabc42d
Fix saving tokenizers in DPR training + unify save and load dirs (#682) 2020-12-16 17:09:47 +01:00
Tanay Soni
4c2804e38e
Add support for aggregating scores in JoinDocuments node (#683) 2020-12-16 15:54:58 +01:00
demSd
143da4cb3f
Fix a typo in DPR args, num_negatives -> num_positives (#681)
* fix a typo, num_negatives -> num_positives

* default value for num_positives

* Update dense.py
2020-12-15 10:10:41 +01:00
Tanay Soni
369e237fd4
Add DocumentStore for Open Distro Elasticsearch (#676) 2020-12-15 09:28:40 +01:00
Tanay Soni
33fe597949
Cleanup Pytest Fixtures (#639) 2020-12-14 18:15:44 +01:00
Branden Chan
d8154939fc
Scale dot product into probabilities (#667)
* scale dot product

* Add tip in documentation

* Add recommendation boxes

* WIP: Use similarity attribute in all doc stores

* Implement similarity for InMemoryDS

* Add FAISS support

* Clean printout

* Update documentation

* Implement document field map
2020-12-11 12:10:24 +01:00
demSd
a0e146dde6
add gpu support for rag (#669)
* add gpu support for rag

* Update transformers.py
2020-12-11 12:08:01 +01:00
Malte Pietsch
149d98a0fd
Add latest benchmark run (#652)
* add latest benchmark run

* update templates and fix small json errors

* Change scale

Co-authored-by: brandenchan <brandenchan@icloud.com>
2020-12-10 16:25:51 +01:00
Timo Moeller
efc754b166
Redone: Fix concatenation of sentences in PreProcessor. Add stride for word-based splits with sentence boundaries (#641)
* Update preprocessor.py

Concatenation of sentences done correctly. Stride functionality enabled for splitting by words while respecting sentence boundaries.

* Simplify code, add test

Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>
2020-12-09 16:12:36 +01:00
Branden Chan
8c904d79d6
Fix links (#663) 2020-12-08 10:28:31 +01:00
Tanay Soni
c4a5de59aa
Add set_node() for Pipeline (#659) 2020-12-07 19:16:35 +01:00
Tanay Soni
4152ad8426
Enable dynamic parameter updates for the FARMReader (#650) 2020-12-07 14:07:20 +01:00
Malte Pietsch
e6ada08d0e
Update query arg in Tutorial 7 (#656) 2020-12-04 08:42:09 +01:00