haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2026-01-08 13:06:29 +00:00

Author	SHA1	Message	Date
Vladimir Blagojevic	84ed954c8c	feat: Improve performance and add default media support in FileTypeClassifier (#5083 ) * feat: add media outgoing edge to FileTypeClassifier * Add release note * Update language --------- Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-08-08 15:51:07 +02:00
tstadel	d46c84bb61	feat: support dynamic filters in custom_query (#5427 ) * support filters in custom_query * better tests * Update docstrings --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-08-08 15:48:15 +02:00
Stefano Fiorucci	3f472995bb	refactor: update Crawler to support selenium>=4.11.0 and simplify it (#5515 ) * refactor crawler * rm unused imports * release notes! * rm outdated mock	2023-08-08 15:13:22 +02:00
Vladimir Blagojevic	1876c41f07	feat: Add LostInTheMiddleRanker (#5457 ) * Add lost in the middle ranker * Add release note * Julian's feedback: more precise version of truncate * Better comments for the litm algorithm * Sebastian PR feedback * Add check for invalid values of word_count_threshold * Remove _truncate as it is not needed any more --------- Co-authored-by: Darja Fokina <daria.f93@gmail.com>	2023-08-02 17:05:13 +02:00
Vladimir Blagojevic	0efe0ee7b3	feat: Add `top_k` parameter to `DiversityRanker` init method (#5494 ) * Add top_k * Add release note	2023-08-02 17:04:04 +02:00
Vladimir Blagojevic	40a2e9b56a	refactor: Update WebRetriever to use LinkContentFetcher (#5229 ) * Refactor WebRetriever to use LinkContentFetcher * PR feedback --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>	2023-08-02 12:45:03 +02:00
Vladimir Blagojevic	540d0fad97	feat: Add DiversityRanker (#5398 ) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com>	2023-08-01 12:48:34 +02:00
elundaeva	612c6779fb	feat: RecentnessRanker (#5301 ) * recency reranker code * removed * readd * edited code * edit * mypy test fix * adding warnings for score method * fix * fix * adding paper link * comments implementation * change to predict and predict_batch * change to predict and predict_batch 2 * adding unit test * fixes * small fixes * fix for unit test * table driven test * small fixes * small fixes2 * adding predict_batch tests * add recentness_ranker to api reference docs * implementing feedback * implementing feedback2 * implementing feedback3 * implementing feedback4 * implementing feedback5 * remove document_map, remove final check if score is not None * add final check if doc score is not None for mypy --------- Co-authored-by: Darja Fokina <daria.f93@gmail.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2023-07-20 16:20:45 +02:00
Sebastian Husch Lee	f7642e83ea	feat: Add embed_meta_fields to Ranker nodes (#5361 ) * Adding embed_meta_fields to ranker nodes * Fix tests by adding case where embed_meta_fields=None * Adding unit test for _add_meta_fields_to_docs * Fix pylint * Add unit test * Added another unit test. Caught a bug. * Adding more unit tests * Add unit test * Updating some older tests into unit tests using mocking * Convert another test to unit test * Test run method * One last unit test	2023-07-18 09:11:51 +02:00
Vladimir Blagojevic	f21005f8ea	refactor: Extract link retrieval from WebRetriever, introduce LinkContentRetriever (#5227 ) * Extract link retrieval from WebRetriever, introduce LinkContentRetriever * Add example --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> Co-authored-by: Daria Fokina <daria.f93@gmail.com>	2023-07-13 12:54:40 +02:00
Sebastian Husch Lee	b5aef24a7e	feat: Add support for meta fields that are lists when using embed_meta_fields (#5307 ) * Add support for meta fields that are lists when using embed_meta_fields * Make sure unit test doesn't download model * Adding more unit tests	2023-07-11 17:32:33 +02:00
Stefano Fiorucci	6632505540	chore: deprecate `SklearnQueryClassifier` (#5324 ) * pin scikit-learn, deprecate SklearnQueryClassifier * rm scikit-learn pin	2023-07-11 17:07:23 +02:00
Sebastian Husch Lee	22750d342c	test: Refactor some retriever tests into unit tests (#5306 ) * Modify and reactivate two unit tests * Refactor openai embedding tests into unit tests * Update test_retriever.py * Changing tests	2023-07-11 13:36:23 +02:00
Stefano Fiorucci	90ff3817e7	feat: support `OpenAI-Organization` for authentication (#5292 ) * add openai_organization to invocation layer, generator and retriever * added tests	2023-07-07 12:02:21 +02:00
bogdankostic	0697f5c63e	fix: Support isolated node eval in run_batch in Generators (#5291 ) * Add isolated node eval to BaseGenerator's run_batch * Add unit tests	2023-07-07 10:32:43 +02:00
Massimiliano Pippi	c068e34954	Remove deprecated param `return_table_cell` (#5218 ) * remove deprecated param * Update haystack/nodes/reader/table.py Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> * try * remove unused functions and ignore mypy error --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>	2023-06-27 16:14:29 +02:00
bogdankostic	82291b56ad	fix: Send batches of query-doc pairs to inference_from_objects (#5125 ) * Send batches of query-doc pairs to inference_from_objects * Use absolute import path * Add separate preprocessing_batch_size parameter	2023-06-26 14:26:26 +02:00
Sebastian	f1932492f1	feat: Add CohereRanker node using Cohere reranking endpoint (#5152 ) * Started to add CohereRanker node * Small refactoring of SentenceTransformersRanker node * Started to add predict_batch method * Simplified predict_batch code * Added missing imports * Undoing a change * Fix mypy * Adding unit tests using mocking * Updated truncation warning message. * Update doc strings * Update to docs * Update haystack/nodes/ranker/cohere.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/nodes/ranker/cohere.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/nodes/ranker/cohere.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/nodes/ranker/cohere.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/nodes/ranker/cohere.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/nodes/ranker/cohere.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Updating docs to reflect PR discussion * Update haystack/nodes/ranker/cohere.py Co-authored-by: Daria Fokina <daria.f93@gmail.com> --------- Co-authored-by: bogdankostic <bogdankostic@web.de> Co-authored-by: Daria Fokina <daria.f93@gmail.com>	2023-06-23 16:46:46 +02:00
ZanSara	31664627eb	feat: hard document length limit at `max_chars_check` (#5191 ) * implement hard cut at max_chars_check * regenerate ids * black * docstring * black	2023-06-23 12:34:19 +02:00
ZanSara	36192eca72	feat: `current_datetime` shaper function (#5195 ) * current_datetime shaper * explicitly add current_datetime to the functions allowed in a prompt template	2023-06-23 10:33:34 +02:00
Sebastian	1602f3abdd	test: Adding unit tests to Ranker (#5167 ) * adding unit tests for sentence transformers ranker * Adding more unit tests * Remove empty line * Undo static method * Revert change * Updated indentation and added match message * Remove unneeded paranthesis	2023-06-22 15:23:23 +02:00
Stefano Fiorucci	637433841e	chore: remove deprecated `Seq2SeqGenerator` and `RAGenerator` (#5180 ) * first draft of removal * more removals * don't download unused models	2023-06-21 16:38:45 +02:00
ZanSara	65cdf36d72	chore: block all HTTP requests in CI (#5088 )	2023-06-13 14:52:24 +02:00
Vladimir Blagojevic	0cc9ce7522	fix: WebRetriever top_k is ignored in a pipeline (#5106 ) * Initial changes * Add WebSearch, WebRetriever top_k unit tests * Add exact integration test that failed Tuana * PR review	2023-06-09 10:42:37 +02:00
Sebastian	1777b22fcb	fix: Ensure eval mode for farm and transformer models for predictions (#3791 ) Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-06-06 13:06:30 +02:00
Michael Feil	6ea8ae01a2	feat: Allow setting custom api_base for OpenAI nodes (#5033 ) * add changes for api_base * format retriever * Update haystack/nodes/retriever/dense.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/nodes/audio/whisper_transcriber.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/preview/components/audio/whisper_remote.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/nodes/answer_generator/openai.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update test_retriever.py * Update test_whisper_remote.py * Update test_generator.py * Update test_retriever.py * reformat with black * Update haystack/nodes/prompt/invocation_layer/chatgpt.py Co-authored-by: Daria Fokina <daria.f93@gmail.com> * Add unit tests * apply docstring suggestions --------- Co-authored-by: bogdankostic <bogdankostic@web.de> Co-authored-by: michaelfeil <me@michaelfeil.eu> Co-authored-by: Daria Fokina <daria.f93@gmail.com>	2023-06-05 11:32:06 +02:00
Massimiliano Pippi	929b8d1fb0	ci: run Elasticsearch 8.6 in compatibility mode (#3853 ) * bump ES version in CI disable ssl wait for service to start set env vars do not use choco to install ES re-enable jobs deps skip test on windows CI because of OOM allocate more memory for ES uniform ES installation and use default heap size skip tests causing OOM increase job timeout restore memory limit for ES8 * Use latest elasticsearch version	2023-05-24 18:53:54 +02:00
Massimiliano Pippi	68924161df	chore: remove deprecated node PDFToTextOCRConverter (#4982 ) * remove deprecated node * remove related test	2023-05-23 16:55:54 +02:00
ZanSara	949b1b63b3	PromptHub integration in `PromptNode` (#4879 ) * initial integration * upgrade of prompthub * fix get_prompt_template * feedback * add prompthub-py to dependencies * tests * mypy * stray changes * review feedback * missing init * fix test * move logic in prompttemplate * linting * bugfixes * fix unit tests * fix cache * simplify prompttemplate init * remove unused function * removing wrong params * try remove all instances of prompt names * more tests * fix agent tests * more tests * fix tests * pylint * comma * black * fix test * docstring * review feedback * review feedback * fix mocks * mypy * fix mocks * fix reference to missing templates * feedback * remove direct references to default template var * tests * Update haystack/nodes/prompt/prompt_node.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-05-23 15:22:58 +02:00
Massimiliano Pippi	c6ea542b57	chore: remove BaseKnowledgeGraph (#4953 ) * remove BaseKnowledgeGraph * fix pylint	2023-05-21 10:42:02 +02:00
Massimiliano Pippi	4974bf7ab3	chore: remove deprecated MilvusDocumentStore (#4951 ) * remove deprecated MilvusDocumentStore * remove leftovers * fix pylint	2023-05-19 16:37:38 +02:00
Vladimir Blagojevic	5d7ee2e5e6	feat: Add max_tokens to BaseGenerator params (#4168 ) * Add max_tokens to BaseGenerator params * Make mypy happy * Rebase and resolve conflicts * Fix signature issues * Update lg * Add a mocked unit test method * end-of-file-fixer corrected file * Convert to unit test * Mark test as integration * make the test unit --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-05-18 15:19:29 +02:00
Massimiliano Pippi	3ea784464a	add test case for #4929 (#4936 )	2023-05-18 09:12:03 +02:00
bogdankostic	df46e7fadd	fix: Use `AutoTokenizer` instead of DPR specific tokenizer (#4898 ) * Use AutoTokenizer instead of DPR specific tokenizer * Adapt TableTextRetriever * Adapt tests * Adapt tests	2023-05-17 18:54:34 +02:00
Stefano Fiorucci	6e0000732d	feat: add BLIP support in `TransformersImageToText` (#4912 ) * add blip support * fix typo Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-05-16 10:57:41 +02:00
bogdankostic	5b2ef2afd6	Revert "refactor!: Deprecate `name` param in `PromptTemplate` and introduce `template_name` instead (#4810 )" (#4834 ) This reverts commit f660f41c0615e6b3064ef3e321f1e5a295fafc1b.	2023-05-08 11:31:04 +02:00
ZanSara	6e982e9283	fix: preserve `root_node` in `JoinNode`'s output (#4820 ) * preserve root_node and add tests * Added if statement to fix failing tests --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>	2023-05-08 10:17:36 +02:00
bogdankostic	f660f41c06	refactor!: Deprecate `name` param in `PromptTemplate` and introduce `template_name` instead (#4810 ) * Deprecate name parameter * Adapt existing tests and uses of PromptTemplate * Move parameter `name` to end * Adapt existing tests * lg update --------- Co-authored-by: Darja Fokina <daria.f93@gmail.com>	2023-05-08 10:12:29 +02:00
Pouyan	75ff768c21	Pouyanpi/feat/search engine/providers/google api (#4722 ) * feat: implement google api search engine provider Signed-off-by: Pouyan <prezakhanipr@gmail.com> --------- Signed-off-by: Pouyan <prezakhanipr@gmail.com>	2023-05-02 17:09:17 +02:00
Mayank Jobanputra	dcf3ddddff	Added deprecation tests for seq2seq generator and RAG Generator (#4782 )	2023-05-02 13:30:22 +05:30
Mayank Jobanputra	896eb6a2ea	chore: fixed reader loading test for hf-hub starting 0.14.0 (#4607 ) * fixed test base for hub 0.13.3 * check if test succeed from branch * 2nd check if test succeed from branch * removed dependency changes --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-05-02 08:22:44 +02:00
Vladimir Blagojevic	dcaf3002f1	fix: SentenceTransformersRanker's predict_batch returns wrong number of documents (#4756 ) * Fix SentenceTransformersRanker spredict_batch returning wrong number of documents * Julian's feedback	2023-04-27 15:24:39 +02:00
Vladimir Blagojevic	aebc22d27e	Upgrade transformers to 4.28.1 (#4665 ) * Upgrade to transformers 4.28.1 * Commenting out failing piece of test * trailing-whitespace * Adjust regex for error match - it changed between releases * Remove RAG tests failing with transformers update	2023-04-27 12:55:21 +02:00
ZanSara	1b57b96210	refactor!: extract `elasticsearch` (#4668 ) * extract elasticsearch * update pyproject.toml * make more import optional * move MockBaseRetriever in conftest * install es in the es integration tests	2023-04-26 10:14:20 +02:00
Sebastian	8d9136bad4	feat: Implementation of Table Cell Proposal (#4616 ) * Starting adding support for TableCell * Update tests to use row and col * Added schema test to check to_dict and from_dict works for Table documents. Also updated Doc.__eq__ to work for tables. * Update eval test to use TableCell * Added more schema tests for table docs, labels and answers. * Add boolean to toggle between Span and TableCell * Add deprecation message * Test that table answers work as responses in the rest API --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-04-19 13:14:49 +02:00
Sebastian	8c4176bdb2	feat: More flexible routing for RouteDocuments node (#4690 ) * Added warning messages for documents that are skipped by RouteDocuments. Begun adding support for new option return_remaining and List of List support for metadata value splitting. * Simplify _split_by_content_type * Added new unit test and updated _calculate_outgoing_edges * Added some TODOs and turned assert into raising an error. * Update logging messages and make new fixture in tests * Update _split_by_metadata_values to work with return_remaining * Remove unneeded code * Documentation * Add proper support for list of lists * Fix mypy errors * Added assert to make mypy happy * Update haystack/nodes/other/route_documents.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * PR comments * Remove check for logging level * make mypy happy * Update docstring of metadata_values * Removed duplicate check. Make explicit check for metadata_values --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-04-18 15:18:13 +02:00
Fernando Pereira	5d41e60d89	fix: ParsrConverter list element added (#4562 ) * fix: list element and mapping logic around it added to ParsrConverter convert step + unit test covering the specific mapping of list content from Parsr's to Haystack's * Code review changes * changed the samples path after conftest changes * added samples_path to function arg --------- Co-authored-by: Namoush <fmpereira22@gmail.com> Co-authored-by: Fernando Pereira <fernando.pereira@criticalsoftware.com> Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-04-12 18:38:21 +05:30
Ben Heckmann	2d65742443	feat: arbitrary `crawler_depth` for `Crawler` class (#4623 ) * #3674 implemented iterative crawler depth * #3674 added two tests for increased crawler depth * removed old comment	2023-04-11 10:39:17 +02:00
Silvano Cerza	5ac3dffbef	test: Rework conftest (#4614 ) * Split root conftest into multiple ones and remove unused fixtures * Remove some constants and make them fixtures * Remove unnecessary fixture scoping * Fix failing whisper tests * Fix image_file_paths fixture	2023-04-11 10:33:43 +02:00
Silvano Cerza	e85dc79eaa	test: Add pytest fixture to block requests in unit tests (#4433 ) * Add pytest fixture to block requests in unit tests * Mark test correctly as integration * Fix crawler unit test failing cause it tries to install chromedriver	2023-04-06 18:04:57 +02:00

1 2 3 4 5

204 Commits