Malte Pietsch
abf2d63c92
Upgrade FAISS to 1.7.0 ( #834 )
2021-02-17 10:00:33 +01:00
Branden Chan
a6a3b74199
Fix image in README
2021-02-16 17:05:15 +01:00
Andrey A
e0be5639ef
Update README.md
2021-02-16 18:47:14 +03:00
Andrey A
ab89fac76a
Update README.md
2021-02-16 18:45:20 +03:00
Andrey A
5c9f7d493c
Fix link to Quick Demo in ToC. ( #831 )
2021-02-16 16:38:04 +01:00
Tanay Soni
07907f9eac
Add support for indexing pipelines ( #816 )
2021-02-16 16:24:28 +01:00
Branden Chan
7030c94325
Revamp Readme ( #820 )
...
* Text changes
* Add new images
* First improvements
* Next iteration
* Resize gif
* Add bold
* Update key concepts diagram
* Center image
* Initial import of a more detailed README.md
* Slight changes to ToC, requirements and across the text.
* Grammar and Streamlit UI png.
* Unfix size of gif for mobile
* Remove requirements, add formatting to numbered lists.
* Formatting, remove img size options.
* Another iteration of phrasing the note about open ports.
* Rephrase the note about the docker ports.
Co-authored-by: Andrey A <56412611+aantti@users.noreply.github.com>
2021-02-16 15:32:43 +01:00
Malte Pietsch
47aae14efa
relax assert precision of arrays
2021-02-15 14:52:13 +01:00
Malte Pietsch
9b1924a54a
Revert TOP_K_PER_CANDIDATE value to 3
2021-02-15 14:30:04 +01:00
Malte Pietsch
0eaae3c0dd
Fix UI when API returns fewer answers than expected ( #828 )
...
* fix ui for few answers from api. add top_k_per_sample env
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-15 14:27:17 +01:00
brandenchan
fe47e3a45e
Fix link in documentation
2021-02-15 11:15:54 +01:00
Malte Pietsch
6798192d40
Add API endpoint to export accuracy metrics from user feedback + created_at timestamp ( #803 )
...
* WIP feedback metrics
* fix filters and zero division
* add created_at and model_name fields to labels
* add created_at value
* remove debug log level
* fix attribute init
* move timestamp creation down to docstore / db level
* fix import
2021-02-15 10:48:59 +01:00
brandenchan
03cda26d85
Fix link in Tutorial 8
2021-02-15 10:45:27 +01:00
Lalit Pagaria
5bd94ac5f7
Adding Translator (standalone component & wrapper for pipelines) ( #782 )
...
* Adding translator with many generic input parameter support
* Making dict_key as generic
* Fixing mypy issue
* Adding pipeline and using opus models
* Add latest docstring and tutorial changes
* Adding test cases for end-to-end translation for generator, summerizer etc
* raise error join and merge nodes
* Fix test failure
* add docstrings. add usage documentation. rm skip_special_tokens param
* Add latest docstring and tutorial changes
* fix code snippets in md
* Adding few extra configuration parameters and fixing tests
* Fixingmypy issue and updating usage document
* fix for mypy issue in pipeline.py
* reverting renaming of pytest_collection_modifyitems method
* Addressing review comments
* setting skip_special_tokens to True
* removing model_max_length argument as None type is not supported to many models
* Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-12 15:58:26 +01:00
oryx1729
4059805d89
Fix ElasticsearchDocumentStore.query_by_embedding() ( #823 )
2021-02-12 14:57:06 +01:00
Pavel Soriano
8adf5b4737
Allow non-standard Tokenizers (e.g. CamemBERT) for DPR via new arg ( #811 )
...
* added parameter to infer DPR tokenizers class
* Add latest docstring and tutorial changes
* Update docstring. fix mypy
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-12 14:17:55 +01:00
oryx1729
c4607cbd98
Revamp CI ( #825 )
2021-02-12 13:38:54 +01:00
Branden Chan
c807f0d050
Add key concepts diagram
2021-02-12 12:49:22 +01:00
Tanay Soni
8b0031bfc1
Remove conditional import of FAISS for Windows ( #819 )
2021-02-12 12:15:23 +01:00
Branden Chan
a1983ad84e
Add new images
2021-02-11 15:10:00 +01:00
Branden Chan
db0364c728
Fix uvloop version to maintain Python<3.7 support
...
uvloop released v0.15 which requires Python >=3.7. This commit fixes the version so that Haystack can be directly installed in colab using pip
2021-02-10 19:16:53 +01:00
Tanay Soni
fd5c5dd23c
Introduce incremental updates for embeddings in document stores ( #812 )
2021-02-09 21:25:01 +01:00
Malte Pietsch
e91518ee00
Update tutorials (torch versions, ES version, replace Finder with Pipeline) ( #814 )
...
* remove manual torch install on colab
* update elasticsearch version everywhere to 7.9.2
* fix FAQPipeline
* update tutorials with new pipelines
* Add latest docstring and tutorial changes
* revert faqpipeline change. fix field names in tutorial 4
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-09 14:56:54 +01:00
Malte Pietsch
ac9f92466f
Allow custom encoding for pdftotext (Russian characters, German umlauts etc). Fix version in download instructions ( #813 )
...
* fix encoding of pdftotext. fix version in download instructions
* fix test
* Add latest docstring and tutorial changes
* make latin-1 default encoding again
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-09 13:42:43 +01:00
Tanay Soni
f95b70df38
Fix file upload API ( #808 )
2021-02-05 12:17:38 +01:00
Tanay Soni
7b18e324f2
Fix building Pipeline with YAML ( #800 )
2021-02-04 11:53:51 +01:00
Branden Chan
f3a3b73d9b
Choose correct similarity fns during benchmark runs & re-run benchmarks ( #773 )
...
* Adapt to new dataset_from_dicts return signature
* rename fn
* Align similarity fn in benchmark doc store
* Better choice of similarity fn
* Increase postgres wait time
* Add more expected returned variables
* update benchmark results
* Fix typo
* update all benchmark runs
* multiply stats by 100
* Specify similarity fns for website
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-03 11:45:18 +01:00
Tanay Soni
8a5dc8f826
Load Pipeline with YAML config file ( #785 )
2021-02-02 17:32:17 +01:00
Malte Pietsch
1318b55eec
Make tqdm progress bars optional (less verbose prod logs) ( #796 )
...
* make dpr queries less verbose
* add progress bar flag to more components
* Add latest docstring and tutorial changes
* add type
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-01 20:51:55 +01:00
Timo Moeller
f3ccd59045
Improve preprocessing and adding of eval data ( #780 )
...
* Remove empty document when splitting text
* Move error message of problematic ids to a highler level
2021-02-01 17:08:27 +01:00
Tanay Soni
b87dd244c1
Get metadata values for a key from Elasticsearch ( #776 )
2021-02-01 16:13:26 +01:00
brandenchan
5665d55ab4
Remove duplicate file
2021-02-01 15:43:53 +01:00
Pavel Soriano
16b8291091
SQuAD to DPR dataset converter ( #765 )
...
* Create squad_to_dpr.py
First commit of the squad2dpr script.
* adding review corrections/improvements
* Merge master 5bf351e
* Move script, add docstring
* Add type hints
Co-authored-by: brandenchan <brandenchan@icloud.com>
2021-02-01 15:40:43 +01:00
Tanay Soni
5bf351ea7b
Fix refresh behaviour for Elasticsearch delete ( #794 )
2021-02-01 14:07:55 +01:00
Tanay Soni
d62355ca88
Fix mypy typing ( #792 )
2021-02-01 12:15:36 +01:00
Branden Chan
1dc74c7067
Add model versioning support ( #784 )
...
* Add model versioning support
* Add latest docstring and tutorial changes
* Support DPR versioning
* Add RAG versioning support
* Add latest docstring and tutorial changes
* Add summarizer support
* Add Embedding Retriever support
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-01 11:42:36 +01:00
Malte Pietsch
2b05e801c3
Fix pdftotext dependency in CI ( #788 )
...
* Fix pdftotext dependency in CI
* udpate xpdf version
* Fix version
2021-01-29 16:07:37 +01:00
Lalit Pagaria
9f7f95221f
Milvus integration ( #771 )
...
* Initial commit for Milvus integration
* Add latest docstring and tutorial changes
* Updating implementation of Milvus document store
* Add latest docstring and tutorial changes
* Adding tests and updating doc string
* Add latest docstring and tutorial changes
* Fixing issue caught by tests
* Addressing review comments
* Fixing mypy detected issue
* Fixing issue caught in test about sorting of vector ids
* fixing test
* Fixing generator test failure
* update docstrings
* Addressing review comments about multiple network call while fetching embedding from milvus server
* Add latest docstring and tutorial changes
* Ignoring mypy issue while converting vector_id to int
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-01-29 13:29:12 +01:00
brandenchan
6efa4f06c1
Add Streamlit UI Image
2021-01-27 17:01:29 +01:00
Timo Moeller
f94bd96ddf
Remove RAG todos after transformers update ( #781 )
2021-01-27 16:50:02 +01:00
Tanay Soni
d9f011da9a
Add flag for use of window queries in SQLDocumentStore ( #768 )
2021-01-25 12:54:34 +01:00
Tanay Soni
46307d1571
Remove quotes around placeholders in Elasticsearch custom query ( #762 )
2021-01-25 12:46:43 +01:00
Tanay Soni
f0aa879a1c
Fix delete_all_documents for the SQLDocumentStore ( #761 )
2021-01-22 14:39:24 +01:00
Markus Paff
aee90c5df9
Docs v0.7.0 ( #757 )
...
* new docs version
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-01-22 10:28:33 +01:00
Malte Pietsch
50815421b0
bump haystack version
v0.7.0
2021-01-21 16:02:33 +01:00
Tanay Soni
337376c81d
Add batch_size
and generators to document stores. ( #733 )
...
* Add batch update of embeddings in document stores
* Resolve merge conflict
* Remove document ordering dependency in tests
* Adjust index buffer size for tests
* Adjust ES Scroll Slice
* Use generator for document store pagination
* Add pagination for InMemoryDocumentStore
* Fix missing index parameter in FAISS update_embeddings()
* Fix FAISS update_embeddings()
* Update FAISS tests
* Update eval tests
* Revert code formatting change
* Fix document count in FAISS update embeddings
* Fix vector_ids reset in SQLDocumentStore
* Update doctrings
* Update docstring
2021-01-21 16:00:08 +01:00
Markus Paff
0b583b8972
Generate docstrings and deploy to branches to Staging (Website) ( #731 )
...
* test pre commit hook
* test status
* test on this branch
* push generated docstrings and tutorials to branch
* fixed syntax error
* Add latest docstring and tutorial changes
* add files before commit
* catch commit error
* separate generation from deployment
* add deployment process for staging
* add current branch to payload
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-01-21 11:01:09 +01:00
Markus Paff
0f62e0b2ee
Script for releasing docs ( #736 )
...
* script for releasing docs
* fix formatting
2021-01-21 10:58:54 +01:00
Timo Moeller
7522d2d1b0
Increase FARM to Version 0.6.2 ( #755 )
...
* Increase farm version
* Fix test
2021-01-21 10:15:41 +01:00
Branden Chan
725c03220f
Reduce memory consumption of fetch_archive_from_http ( #737 )
2021-01-21 09:57:55 +01:00