haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-08-09 00:59:08 +00:00

Author	SHA1	Message	Date
Fanli Lin	f6b50cfdf9	fix: StopWordsCriteria doesn't compare the stop word token ids with the input ids in a continuous and sequential order (#5503 ) * bug fix * add release note * add unit test * refactor --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2023-08-08 08:35:10 +02:00
Massimiliano Pippi	ac4e762422	Fix datadog client init (#5524 )	2023-08-07 12:18:46 +02:00
Massimiliano Pippi	c079576a87	chore: move base test class into haystack core (#5509 ) * move base test class into haystack core * fix linter * do not compute coverage of testing code	2023-08-04 12:42:13 +02:00
Vladimir Blagojevic	d96c963bc4	test: Convert two HFLocalInvocationLayer integration to unit tests (#5446 ) * Convert two HFLocalInvocationLayer integration to unit tests * Simplify unit test * Improve HFLocalInvocationLayer unit tests	2023-08-03 17:41:32 +02:00
bogdankostic	56cea8cbbd	test: Add scripts to send benchmark results to datadog (#5432 ) * Add config files * log benchmarks to stdout * Add top-k and batch size to configs * Add batch size to configs * fix: don't download files if they already exist * Add batch size to configs * refine script * Remove configs using 1m docs * update run script * update run script * update run script * datadog integration * remove out folder * gitignore benchmarks output * test: send benchmarks to datadog * remove uncommented lines in script * feat: take branch/tag argument for benchmark setup script * fix: run.sh should ignore errors * Remove changes unrelated to datadog * Apply black * Update test/benchmarks/utils.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * PR feedback * Account for reader benchmarks not doing indexing * Change key of reader metrics * Apply PR feedback * Remove whitespace --------- Co-authored-by: rjanjua <rohan.janjua@gmail.com> Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-08-03 10:09:00 +02:00
Vladimir Blagojevic	1876c41f07	feat: Add LostInTheMiddleRanker (#5457 ) * Add lost in the middle ranker * Add release note * Julian's feedback: more precise version of truncate * Better comments for the litm algorithm * Sebastian PR feedback * Add check for invalid values of word_count_threshold * Remove _truncate as it is not needed any more --------- Co-authored-by: Darja Fokina <daria.f93@gmail.com>	2023-08-02 17:05:13 +02:00
Vladimir Blagojevic	0efe0ee7b3	feat: Add `top_k` parameter to `DiversityRanker` init method (#5494 ) * Add top_k * Add release note	2023-08-02 17:04:04 +02:00
Fanli Lin	8d04f28e11	fix: hf agent outputs the prompt text while the openai agent not (#5461 ) * add skil prompt * fix formatting * add release note * add release note * Update releasenotes/notes/add-skip-prompt-for-hf-model-agent-89aef2838edb907c.yaml Co-authored-by: Daria Fokina <daria.f93@gmail.com> * Update haystack/nodes/prompt/invocation_layer/handlers.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/nodes/prompt/invocation_layer/handlers.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/nodes/prompt/invocation_layer/hugging_face.py Co-authored-by: bogdankostic <bogdankostic@web.de> * add a unit test * add a unit test2 * add skil prompt * Revert "add skil prompt" This reverts commit b1ba938c94b67a4fd636d321945990aabd2c5b2a. * add unit test --------- Co-authored-by: Daria Fokina <daria.f93@gmail.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-08-02 16:34:33 +02:00
Fanli Lin	73fa796735	fix: enable passing `max_length` for text2text-generation task (#5420 ) * bug fix * add unit test * reformatting * add release note * add release note * Update releasenotes/notes/enable-set-max-length-during-runtime-097d65e537bf800b.yaml Co-authored-by: bogdankostic <bogdankostic@web.de> * Update test/prompt/invocation_layer/test_hugging_face.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update test/prompt/invocation_layer/test_hugging_face.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update test/prompt/invocation_layer/test_hugging_face.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update test/prompt/invocation_layer/test_hugging_face.py Co-authored-by: bogdankostic <bogdankostic@web.de> * bug fix --------- Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-08-02 14:13:30 +02:00
Vladimir Blagojevic	40a2e9b56a	refactor: Update WebRetriever to use LinkContentFetcher (#5229 ) * Refactor WebRetriever to use LinkContentFetcher * PR feedback --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>	2023-08-02 12:45:03 +02:00
Fanli Lin	f7fd5eeb4f	feat: enable loading tokenizer for models that are not supported by the transformers library (#5314 ) * add tokenizer load * change import order * move imports * refactor code * import lib * remove pretrainedmodel * fix linting * update patch * fix order * remove tokenizer class * use tokenizer class * no copy * add case for model is an instance * fix optional * add ut * set default to None * change models * Update haystack/nodes/prompt/invocation_layer/hugging_face.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/nodes/prompt/invocation_layer/hugging_face.py Co-authored-by: bogdankostic <bogdankostic@web.de> * add unit tests * add unit tests * remove lib * formatting * formatting * formatting * add release note * Update releasenotes/notes/load-tokenizer-if-not-load-by-transformers-5841cdc9ff69bcc2.yaml Co-authored-by: bogdankostic <bogdankostic@web.de> --------- Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-08-02 11:42:23 +02:00
Vladimir Blagojevic	540d0fad97	feat: Add DiversityRanker (#5398 ) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com>	2023-08-01 12:48:34 +02:00
bogdankostic	a51ca19fe4	feat: Add `TextFileToDocument` component (v2) (#5467 ) * Add TextfileToDocument component * Add docstrings * Add unit tests * Add release note file * Make use of progress bar * Add TextfileToDocument to __init__.py * Use lazy % formatting in logging functions * Remove f from non-f-string * Add TextfileToDocument to __init__.py * Use correct dependency extra * Compare file path against path object * PR feedback * PR feedback * Update haystack/preview/components/file_converters/txt.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update docstrings * Add error handling * Add unit test * Reintroduce falsely removed caplog --------- Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2023-08-01 11:34:52 +02:00
Stefano Fiorucci	6f534873a5	fix: restrict `supports` method in the OpenAI invocation layer and a similar method in the `EmbeddingRetriever` (#5458 ) * restrict OpenAI supports method * better note * Update releasenotes/notes/restrict-openai-supports-method-fb126583e4beb057.yaml Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> --------- Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2023-07-31 13:14:22 +02:00
Vladimir Blagojevic	409e3471cb	feat: Enable Support for Meta LLama-2 Models in Amazon Sagemaker (#5437 ) * Enable Support for Meta LLama-2 Models in Amazon Sagemaker * Improve unit test for invocation layers positioning * Small adjustment, add more unit tests * mypy fixes * Improve unit tests * Update test/prompt/invocation_layer/test_sagemaker_meta.py Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> * PR feedback * Add pydocs for newly extracted methods * simplify is_proper_chat_* --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> Co-authored-by: anakin87 <stefanofiorucci@gmail.com>	2023-07-26 15:26:39 +02:00
Silvano Cerza	7940ec0482	Add @store decorator (#5438 )	2023-07-26 09:32:23 +02:00
Julian Risch	5bb0a1f57a	Revert "fix: num_return_sequences should be less than num_beams, not top_k (#5280 )" (#5434 ) This reverts commit 514f93a6eb575d376b21d22e32080fac62cf785f.	2023-07-25 13:27:41 +02:00
Sebastian Husch Lee	2bc7fe1a08	test: reactivate unit tests in `test_eval.py` (#5255 ) * Activate tests that follow unit test and integration test rules * Adding more integration labels * Change name to better reflect complexity of test * Remove mark integration tags, move test to doc store test for add_eval_data * Removing incorrect integration label * Deactivated document store test b/c it fails for Weaviate and pinecone * Remove unit label since test needs to be refactored to be considered a unit test * Undo changes * Undo change * Check every field in the load evaluation result * Add back label and add skip reason * Use pytest skip instead of TODO	2023-07-24 17:07:45 +02:00
Vladimir Blagojevic	597df1414c	feat: Update Anthropic Claude support with the latest models, new streaming API, context window sizes (#5406 ) * Update Claude support with the latest models, new streaming API, context window sizes * Use Github Anthropic SDK link for tokenizer, revert _init_tokenizer * Change example key name to ANTHROPIC_API_KEY	2023-07-21 13:33:07 +02:00
elundaeva	612c6779fb	feat: RecentnessRanker (#5301 ) * recency reranker code * removed * readd * edited code * edit * mypy test fix * adding warnings for score method * fix * fix * adding paper link * comments implementation * change to predict and predict_batch * change to predict and predict_batch 2 * adding unit test * fixes * small fixes * fix for unit test * table driven test * small fixes * small fixes2 * adding predict_batch tests * add recentness_ranker to api reference docs * implementing feedback * implementing feedback2 * implementing feedback3 * implementing feedback4 * implementing feedback5 * remove document_map, remove final check if score is not None * add final check if doc score is not None for mypy --------- Co-authored-by: Darja Fokina <daria.f93@gmail.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2023-07-20 16:20:45 +02:00
Sebastian Husch Lee	f7642e83ea	feat: Add embed_meta_fields to Ranker nodes (#5361 ) * Adding embed_meta_fields to ranker nodes * Fix tests by adding case where embed_meta_fields=None * Adding unit test for _add_meta_fields_to_docs * Fix pylint * Add unit test * Added another unit test. Caught a bug. * Adding more unit tests * Add unit test * Updating some older tests into unit tests using mocking * Convert another test to unit test * Test run method * One last unit test	2023-07-18 09:11:51 +02:00
ZanSara	8f3fe85878	feat: extend `pipeline.add_component` to support stores (#5261 ) * add protocol and adapt pipeline * change API in pipeline.add_component * adapt pipeline tests * adapt memoryretriever * additional checks * separate protocol and mixin * review feedback & update tests * pylint * Update haystack/preview/document_stores/protocols.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Update haystack/preview/document_stores/memory/document_store.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * docstring of Store * adapt memorydocumentstore * fix tests * remove direct inheritance * pylint * Update haystack/preview/document_stores/mixins.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Update test/preview/components/retrievers/test_memory_retriever.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Update test/preview/components/retrievers/test_memory_retriever.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Update test/preview/components/retrievers/test_memory_retriever.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Update test/preview/components/retrievers/test_memory_retriever.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Update test/preview/components/retrievers/test_memory_retriever.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * test names * revert suggestion * private self._stores * move asserts out * remove protocols * review feedback * review feedback * fix tests * mypy * review feedback * fix tests & other details * naming * mypy * fix tests * typing * partial review feedback * move .store to input dataclass * Revert "move .store to input dataclass" This reverts commit 53f624b99f3414c89d5134711725b31bd94ef77a. * disable reusing components with stores * disable sharing components with docstores * Update mixins.py * black * upgrade canals & fix tests --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-07-17 15:06:19 +02:00
Vladimir Blagojevic	adfabdd648	Improve token limit tests for OpenAI PromptNode layer (#5351 )	2023-07-17 14:03:03 +02:00
Fanli Lin	9891bfeddd	fix: a small bug in StopWordsCriteria (#5316 )	2023-07-13 15:58:06 +02:00
bogdankostic	237d67dbfd	feat: Check version of Elasticsearch server and add support for Elasticsearch <= 7.5 (#5320 ) * Check ES server version + add support for ES <= 7.5 * Adapt comment * PR feedback	2023-07-13 14:50:43 +02:00
Vladimir Blagojevic	f21005f8ea	refactor: Extract link retrieval from WebRetriever, introduce LinkContentRetriever (#5227 ) * Extract link retrieval from WebRetriever, introduce LinkContentRetriever * Add example --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> Co-authored-by: Daria Fokina <daria.f93@gmail.com>	2023-07-13 12:54:40 +02:00
MichelBartels	fd350bbb8f	fix: Run HFLocalInvocationLayer.supports even if inference packages are not installed (#5308 ) --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2023-07-13 12:52:56 +02:00
ZanSara	7848f00d01	feat: upgrade `canals` in preview (#5344 ) * upgrade nodes * linting	2023-07-13 12:30:49 +02:00
Sebastian Husch Lee	b5aef24a7e	feat: Add support for meta fields that are lists when using embed_meta_fields (#5307 ) * Add support for meta fields that are lists when using embed_meta_fields * Make sure unit test doesn't download model * Adding more unit tests	2023-07-11 17:32:33 +02:00
Stefano Fiorucci	6632505540	chore: deprecate `SklearnQueryClassifier` (#5324 ) * pin scikit-learn, deprecate SklearnQueryClassifier * rm scikit-learn pin	2023-07-11 17:07:23 +02:00
Sebastian Husch Lee	22750d342c	test: Refactor some retriever tests into unit tests (#5306 ) * Modify and reactivate two unit tests * Refactor openai embedding tests into unit tests * Update test_retriever.py * Changing tests	2023-07-11 13:36:23 +02:00
Fanli Lin	514f93a6eb	fix: num_return_sequences should be less than num_beams, not top_k (#5280 ) * formatting * remove top_k variable * add pytest * add numbers * string formatting * fix formatting * revert * extend tests with assertions for num_return_sequences --------- Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2023-07-11 12:20:21 +02:00
bogdankostic	b7f683bfa4	ci: Add unit test for Elasticsearch8 (#5300 ) * Add job for ES8 integration tests * Add unit test for Elasticsearch 8 * Add tests.yml * Adapt tests.yml * Remove added white space * Adapt tests.yml * Adapt tests.yml * Add dependencies to unit test name * Adapt unit test matrix * Adapt unit test matrix * Adapt unit test matrix * Adapt unit test matrix * Update tests.yml * Create separate tests where necessary * Fix skip * Adapt tests	2023-07-10 16:03:50 +02:00
tstadel	9acb275680	fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests (#5113 ) * use _source on opensearch bulk requests * fix label bulk requests * add tests * fix test * apply feedback --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2023-07-07 15:12:50 +02:00
ZanSara	13bed30504	feat: batch mode for `MemoryRetriever` (v2) (#5287 ) * memoryretriever batch mode * typing of output	2023-07-07 12:10:35 +02:00
ZanSara	f49bd3a12f	feat: introduce `Store` protocol (v2) (#5259 ) * add protocol and adapt pipeline * review feedback & update tests * pylint * Update haystack/preview/document_stores/protocols.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Update haystack/preview/document_stores/memory/document_store.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * docstring of Store * adapt memorydocumentstore * fix tests --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-07-07 12:10:08 +02:00
Stefano Fiorucci	90ff3817e7	feat: support `OpenAI-Organization` for authentication (#5292 ) * add openai_organization to invocation layer, generator and retriever * added tests	2023-07-07 12:02:21 +02:00
bogdankostic	0697f5c63e	fix: Support isolated node eval in run_batch in Generators (#5291 ) * Add isolated node eval to BaseGenerator's run_batch * Add unit tests	2023-07-07 10:32:43 +02:00
MichelBartels	08f1865ddd	fix: Improve robustness of get_task HF pipeline invocations (#5284 ) * replace get_task method and change invocation layer order * add test for invocation layer order * add test documentation * make invocation layer test more robust * fix type annotation * change hf timeout * simplify timeout mock and add get_task exception cause --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>	2023-07-06 16:33:44 +02:00
Vladimir Blagojevic	ac412193cc	refactor: Simplify selection of Azure vs OpenAI invocation layers (#5271 )	2023-07-06 13:23:13 +02:00
Silvano Cerza	a1a390056a	Remove requests_cache in tests (#5285 )	2023-07-06 13:22:52 +02:00
bogdankostic	fd25106c88	test: Adapt batch size in retriever-reader benchmarks (#5281 )	2023-07-06 10:42:34 +02:00
Sebastian Husch Lee	da2c9b4799	test: Update `test/others/test_utils.py` (#5270 ) * Add unit test mark for appropriate tests * Remove deepset Cloud specific tests * Create pytest fixtures * Reduce number of checks run for test_match_context_multi_process and test_match_context_single_process * Increase speed of test_match_contexts_multi_process * Revert "Remove deepset Cloud specific tests" This reverts commit b65173665f3e873f17f3613c5fd4fa3174a6d71b. * Continuing revert commit * Remove unnecessary comment * Break down bigger test into smaller tests	2023-07-05 12:00:32 +02:00
Sebastian Husch Lee	12f319b4c9	Remove deprecated return_table_cell from conftest.py (#5264 )	2023-07-05 09:37:41 +02:00
Sebastian Husch Lee	87281b2e10	Fix to_dict and from_dict of Multilabel such that to_dict outputs a json serializable object (using Label.to_dict()) (#5257 )	2023-07-04 12:44:11 +02:00
Malte Reimann	195077eca9	fix: `import_utils fetch_archive_from_http` - improve url parsing for fetching archive from http (#5199 ) * Use urlparse to get file extension for urls that contain text after the file extension such as query parameters * Run pre-commit to fix format * Reformat import_utils * Document get_filename_extension_from_url * Formatting * Formatting * Update haystack/utils/import_utils.py Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> * Update haystack/utils/import_utils.py Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>	2023-07-04 10:20:58 +02:00
Vladimir Blagojevic	1066e959a2	bug: fix for pinecone not working for per document updates (#5110 )	2023-07-03 14:07:52 +02:00
Stefano Fiorucci	1be39367ac	Fix: `FAISSDocumentStore` - make `write_documents` properly work in combination w `update_embeddings` (#5221 ) * Update VERSION.txt * first draft * simplify method and test * rm unnecessary pb.close * integrate feedback	2023-07-03 10:07:36 +02:00
Massimiliano Pippi	cb638af0ff	refactor: fix method type and add comments (#5235 ) * fix method type and add comments * fix tests	2023-06-30 11:55:52 +02:00
Massimiliano Pippi	037e4f24ce	refactor: add a new Document Store supporting Elasticsearch 8 (#5231 ) * introduce es8 * prepare tests * fix unit tests * adjust tests * install elastic_transport package * make mypy happy * fix opensearch tests	2023-06-29 16:40:10 +02:00

... 6 7 8 9 10 ...

1204 Commits