Julian Risch
eeb29b5686
test: Re-activate end-to-end tests workflow ( #5343 )
...
* Install haystack with required extras
* remove whitespaces
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Add sleep
* Add s for seconds
* Move container initialization in workflow
* Update e2e.yml
add nightly run
* use new folder for initial e2e test
* use file hash for caching and trigger on push to branch
* remove \n from model names read from file
* remove trigger on push to branch
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-07-20 11:48:51 +02:00
Sebastian Husch Lee
f7642e83ea
feat: Add embed_meta_fields to Ranker nodes ( #5361 )
...
* Adding embed_meta_fields to ranker nodes
* Fix tests by adding case where embed_meta_fields=None
* Adding unit test for _add_meta_fields_to_docs
* Fix pylint
* Add unit test
* Added another unit test. Caught a bug.
* Adding more unit tests
* Add unit test
* Updating some older tests into unit tests using mocking
* Convert another test to unit test
* Test run method
* One last unit test
2023-07-18 09:11:51 +02:00
elundaeva
e0cf1421c6
proposal: Add RecentnessRanker
component ( #5289 )
...
proposal for adding Recentness Ranker to Haystack
2023-07-17 16:33:47 +02:00
Fanli Lin
09a1d3c0dc
remove duplicate ( #5368 )
2023-07-17 16:24:02 +02:00
ZanSara
8f3fe85878
feat: extend pipeline.add_component
to support stores ( #5261 )
...
* add protocol and adapt pipeline
* change API in pipeline.add_component
* adapt pipeline tests
* adapt memoryretriever
* additional checks
* separate protocol and mixin
* review feedback & update tests
* pylint
* Update haystack/preview/document_stores/protocols.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/preview/document_stores/memory/document_store.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* docstring of Store
* adapt memorydocumentstore
* fix tests
* remove direct inheritance
* pylint
* Update haystack/preview/document_stores/mixins.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* test names
* revert suggestion
* private self._stores
* move asserts out
* remove protocols
* review feedback
* review feedback
* fix tests
* mypy
* review feedback
* fix tests & other details
* naming
* mypy
* fix tests
* typing
* partial review feedback
* move .store to input dataclass
* Revert "move .store to input dataclass"
This reverts commit 53f624b99f3414c89d5134711725b31bd94ef77a.
* disable reusing components with stores
* disable sharing components with docstores
* Update mixins.py
* black
* upgrade canals & fix tests
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-07-17 15:06:19 +02:00
Vladimir Blagojevic
adfabdd648
Improve token limit tests for OpenAI PromptNode layer ( #5351 )
2023-07-17 14:03:03 +02:00
Ikko Eltociear Ashimine
35b2c99f43
chore: fix typo in base.py ( #5356 )
...
paramters -> parameters
2023-07-13 18:40:21 +02:00
Fanli Lin
9891bfeddd
fix: a small bug in StopWordsCriteria ( #5316 )
2023-07-13 15:58:06 +02:00
bogdankostic
237d67dbfd
feat: Check version of Elasticsearch server and add support for Elasticsearch <= 7.5 ( #5320 )
...
* Check ES server version + add support for ES <= 7.5
* Adapt comment
* PR feedback
2023-07-13 14:50:43 +02:00
Daria Fokina
63fd63ff23
fix: update WebRetriever docstrings and default mode ( #5352 )
...
* update WebRetriever docstrings
* snippets as default mode
2023-07-13 13:57:52 +02:00
Vladimir Blagojevic
f21005f8ea
refactor: Extract link retrieval from WebRetriever, introduce LinkContentRetriever ( #5227 )
...
* Extract link retrieval from WebRetriever, introduce LinkContentRetriever
* Add example
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-07-13 12:54:40 +02:00
MichelBartels
fd350bbb8f
fix: Run HFLocalInvocationLayer.supports even if inference packages are not installed ( #5308 )
...
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-07-13 12:52:56 +02:00
ZanSara
7848f00d01
feat: upgrade canals
in preview ( #5344 )
...
* upgrade nodes
* linting
2023-07-13 12:30:49 +02:00
Stefano Fiorucci
8750d92763
pin scikit-learn>=1.3.0 ( #5322 )
2023-07-13 11:11:28 +02:00
Daria Fokina
a152a812da
create invocation-layers API reference page ( #5262 )
...
* create invocation-layers.yml
* Update invocation-layers.yml
2023-07-12 17:48:27 +02:00
Sebastian Husch Lee
b5aef24a7e
feat: Add support for meta fields that are lists when using embed_meta_fields ( #5307 )
...
* Add support for meta fields that are lists when using embed_meta_fields
* Make sure unit test doesn't download model
* Adding more unit tests
2023-07-11 17:32:33 +02:00
Stefano Fiorucci
6632505540
chore: deprecate SklearnQueryClassifier
( #5324 )
...
* pin scikit-learn, deprecate SklearnQueryClassifier
* rm scikit-learn pin
2023-07-11 17:07:23 +02:00
Sebastian Husch Lee
22750d342c
test: Refactor some retriever tests into unit tests ( #5306 )
...
* Modify and reactivate two unit tests
* Refactor openai embedding tests into unit tests
* Update test_retriever.py
* Changing tests
2023-07-11 13:36:23 +02:00
Fanli Lin
514f93a6eb
fix: num_return_sequences should be less than num_beams, not top_k ( #5280 )
...
* formatting
* remove top_k variable
* add pytest
* add numbers
* string formatting
* fix formatting
* revert
* extend tests with assertions for num_return_sequences
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-07-11 12:20:21 +02:00
bogdankostic
41668f26d6
ci: Update labeler.yml to account for Elasticsearch changes ( #5318 )
2023-07-11 11:51:49 +02:00
Sebastian Husch Lee
2703c2d483
docs: Small documentation updates to dense.py ( #5305 )
...
* Small documentation updates
* Update doc strings
2023-07-10 18:16:49 +02:00
bogdankostic
b7f683bfa4
ci: Add unit test for Elasticsearch8 ( #5300 )
...
* Add job for ES8 integration tests
* Add unit test for Elasticsearch 8
* Add tests.yml
* Adapt tests.yml
* Remove added white space
* Adapt tests.yml
* Adapt tests.yml
* Add dependencies to unit test name
* Adapt unit test matrix
* Adapt unit test matrix
* Adapt unit test matrix
* Adapt unit test matrix
* Update tests.yml
* Create separate tests where necessary
* Fix skip
* Adapt tests
2023-07-10 16:03:50 +02:00
bogdankostic
048fc7f640
ci: Add job for ES8 integration tests ( #5297 )
...
* Add job for ES8 integration tests
* Remove whitespace
* Fix filename
* Add tests.yml
* Revert "Add tests.yml"
This reverts commit ec12654d4e146b5ef6cba04ad82f5973935d8520.
2023-07-10 10:43:05 +02:00
bogdankostic
206b21816c
chore: Adapt import message for Elasticsearch7 ( #5295 )
...
* Adapt import message for es7.ElasticsearchDocumentStore
* Move import statement
2023-07-10 10:21:26 +02:00
bogdankostic
86d1fb5e1c
builld: Add elasticsearch7 and elasticsearch8 extra ( #5296 )
2023-07-10 09:59:51 +02:00
Silvano Cerza
d6f855cbc5
chore: Add support for hierarchical docs ( #5278 )
...
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-07-07 17:00:29 +02:00
tstadel
9acb275680
fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests ( #5113 )
...
* use _source on opensearch bulk requests
* fix label bulk requests
* add tests
* fix test
* apply feedback
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-07-07 15:12:50 +02:00
ZanSara
13bed30504
feat: batch mode for MemoryRetriever
(v2) ( #5287 )
...
* memoryretriever batch mode
* typing of output
2023-07-07 12:10:35 +02:00
ZanSara
f49bd3a12f
feat: introduce Store
protocol (v2) ( #5259 )
...
* add protocol and adapt pipeline
* review feedback & update tests
* pylint
* Update haystack/preview/document_stores/protocols.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/preview/document_stores/memory/document_store.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* docstring of Store
* adapt memorydocumentstore
* fix tests
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-07-07 12:10:08 +02:00
Stefano Fiorucci
90ff3817e7
feat: support OpenAI-Organization
for authentication ( #5292 )
...
* add openai_organization to invocation layer, generator and retriever
* added tests
2023-07-07 12:02:21 +02:00
bogdankostic
0697f5c63e
fix: Support isolated node eval in run_batch in Generators ( #5291 )
...
* Add isolated node eval to BaseGenerator's run_batch
* Add unit tests
2023-07-07 10:32:43 +02:00
Vladimir Blagojevic
395854d823
Add cpu-remote-inference
Docker image ( #5225 )
...
* Add cpu-remote-inference Docker image
* Add web lfqa pipeline as an example for cpu-remote-inference Docker image
* WebRetriever must have document_store attribute
* Add cpu-remote-inference-latest
* Add image testing in CI
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-07-07 10:23:14 +02:00
MichelBartels
08f1865ddd
fix: Improve robustness of get_task HF pipeline invocations ( #5284 )
...
* replace get_task method and change invocation layer order
* add test for invocation layer order
* add test documentation
* make invocation layer test more robust
* fix type annotation
* change hf timeout
* simplify timeout mock and add get_task exception cause
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-07-06 16:33:44 +02:00
Vladimir Blagojevic
ac412193cc
refactor: Simplify selection of Azure vs OpenAI invocation layers ( #5271 )
2023-07-06 13:23:13 +02:00
Silvano Cerza
a1a390056a
Remove requests_cache in tests ( #5285 )
2023-07-06 13:22:52 +02:00
Vladimir Blagojevic
ad6072728d
Add dependencies to build lxml successfully ( #5288 )
2023-07-06 12:53:28 +02:00
bogdankostic
fd25106c88
test: Adapt batch size in retriever-reader benchmarks ( #5281 )
2023-07-06 10:42:34 +02:00
Sebastian Husch Lee
da2c9b4799
test: Update test/others/test_utils.py
( #5270 )
...
* Add unit test mark for appropriate tests
* Remove deepset Cloud specific tests
* Create pytest fixtures
* Reduce number of checks run for test_match_context_multi_process and test_match_context_single_process
* Increase speed of test_match_contexts_multi_process
* Revert "Remove deepset Cloud specific tests"
This reverts commit b65173665f3e873f17f3613c5fd4fa3174a6d71b.
* Continuing revert commit
* Remove unnecessary comment
* Break down bigger test into smaller tests
2023-07-05 12:00:32 +02:00
Sebastian Husch Lee
12f319b4c9
Remove deprecated return_table_cell from conftest.py ( #5264 )
2023-07-05 09:37:41 +02:00
Massimiliano Pippi
00efa514ca
refactor: remove Elasticsearch client version 8 deprecation warnings ( #5245 )
...
* remove deprecation warnings
* remove leftover
2023-07-04 14:17:34 +02:00
Sebastian Husch Lee
87281b2e10
Fix to_dict and from_dict of Multilabel such that to_dict outputs a json serializable object (using Label.to_dict()) ( #5257 )
2023-07-04 12:44:11 +02:00
Malte Reimann
195077eca9
fix: import_utils fetch_archive_from_http
- improve url parsing for fetching archive from http ( #5199 )
...
* Use urlparse to get file extension for urls that contain text after the file extension such as query parameters
* Run pre-commit to fix format
* Reformat import_utils
* Document get_filename_extension_from_url
* Formatting
* Formatting
* Update haystack/utils/import_utils.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* Update haystack/utils/import_utils.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-07-04 10:20:58 +02:00
ZanSara
4b380d8fb0
fix: install inference
in REST API tests ( #5252 )
...
* install inference in restapi tests
* add workflow dispatch to test the REST API CI in PR
* trigger ci
* tablecell
* tablecell
* revert ci trigger
* mypy
2023-07-03 15:10:14 +02:00
Vladimir Blagojevic
1066e959a2
bug: fix for pinecone not working for per document updates ( #5110 )
2023-07-03 14:07:52 +02:00
Stefano Fiorucci
1be39367ac
Fix: FAISSDocumentStore
- make write_documents
properly work in combination w update_embeddings
( #5221 )
...
* Update VERSION.txt
* first draft
* simplify method and test
* rm unnecessary pb.close
* integrate feedback
2023-07-03 10:07:36 +02:00
Massimiliano Pippi
aee862833e
Update pyproject.toml ( #5244 )
2023-06-30 19:44:08 +02:00
Massimiliano Pippi
6c1d0fbf04
refactor: isolate Elasticsearch client calls ( #5241 )
...
* isolate client code
* pass headers
* pass headers
* more adjustments
* revert
* revert
* leftover
* fix opensearch
2023-06-30 18:29:01 +02:00
Massimiliano Pippi
cb638af0ff
refactor: fix method type and add comments ( #5235 )
...
* fix method type and add comments
* fix tests
2023-06-30 11:55:52 +02:00
Massimiliano Pippi
037e4f24ce
refactor: add a new Document Store supporting Elasticsearch 8 ( #5231 )
...
* introduce es8
* prepare tests
* fix unit tests
* adjust tests
* install elastic_transport package
* make mypy happy
* fix opensearch tests
2023-06-29 16:40:10 +02:00
Massimiliano Pippi
d5c13aa71d
refactor: introduce a base class for ElasticsearchDocumentStore ( #5228 )
...
* introduce a base class
* forgot
* fix linting
* try
* try
* schema generation doesnt support aliasing, use the same name
2023-06-29 13:28:49 +02:00