haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2026-01-05 11:38:20 +00:00

Author	SHA1	Message	Date
Vladimir Blagojevic	0cc9ce7522	fix: WebRetriever top_k is ignored in a pipeline (#5106 ) * Initial changes * Add WebSearch, WebRetriever top_k unit tests * Add exact integration test that failed Tuana * PR review	2023-06-09 10:42:37 +02:00
Sebastian	1777b22fcb	fix: Ensure eval mode for farm and transformer models for predictions (#3791 ) Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-06-06 13:06:30 +02:00
Michael Feil	6ea8ae01a2	feat: Allow setting custom api_base for OpenAI nodes (#5033 ) * add changes for api_base * format retriever * Update haystack/nodes/retriever/dense.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/nodes/audio/whisper_transcriber.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/preview/components/audio/whisper_remote.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/nodes/answer_generator/openai.py Co-authored-by: bogdankostic <bogdankostic@web.de> * Update test_retriever.py * Update test_whisper_remote.py * Update test_generator.py * Update test_retriever.py * reformat with black * Update haystack/nodes/prompt/invocation_layer/chatgpt.py Co-authored-by: Daria Fokina <daria.f93@gmail.com> * Add unit tests * apply docstring suggestions --------- Co-authored-by: bogdankostic <bogdankostic@web.de> Co-authored-by: michaelfeil <me@michaelfeil.eu> Co-authored-by: Daria Fokina <daria.f93@gmail.com>	2023-06-05 11:32:06 +02:00
Massimiliano Pippi	929b8d1fb0	ci: run Elasticsearch 8.6 in compatibility mode (#3853 ) * bump ES version in CI disable ssl wait for service to start set env vars do not use choco to install ES re-enable jobs deps skip test on windows CI because of OOM allocate more memory for ES uniform ES installation and use default heap size skip tests causing OOM increase job timeout restore memory limit for ES8 * Use latest elasticsearch version	2023-05-24 18:53:54 +02:00
Massimiliano Pippi	68924161df	chore: remove deprecated node PDFToTextOCRConverter (#4982 ) * remove deprecated node * remove related test	2023-05-23 16:55:54 +02:00
ZanSara	949b1b63b3	PromptHub integration in `PromptNode` (#4879 ) * initial integration * upgrade of prompthub * fix get_prompt_template * feedback * add prompthub-py to dependencies * tests * mypy * stray changes * review feedback * missing init * fix test * move logic in prompttemplate * linting * bugfixes * fix unit tests * fix cache * simplify prompttemplate init * remove unused function * removing wrong params * try remove all instances of prompt names * more tests * fix agent tests * more tests * fix tests * pylint * comma * black * fix test * docstring * review feedback * review feedback * fix mocks * mypy * fix mocks * fix reference to missing templates * feedback * remove direct references to default template var * tests * Update haystack/nodes/prompt/prompt_node.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-05-23 15:22:58 +02:00
Massimiliano Pippi	c6ea542b57	chore: remove BaseKnowledgeGraph (#4953 ) * remove BaseKnowledgeGraph * fix pylint	2023-05-21 10:42:02 +02:00
Massimiliano Pippi	4974bf7ab3	chore: remove deprecated MilvusDocumentStore (#4951 ) * remove deprecated MilvusDocumentStore * remove leftovers * fix pylint	2023-05-19 16:37:38 +02:00
Vladimir Blagojevic	5d7ee2e5e6	feat: Add max_tokens to BaseGenerator params (#4168 ) * Add max_tokens to BaseGenerator params * Make mypy happy * Rebase and resolve conflicts * Fix signature issues * Update lg * Add a mocked unit test method * end-of-file-fixer corrected file * Convert to unit test * Mark test as integration * make the test unit --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-05-18 15:19:29 +02:00
Massimiliano Pippi	3ea784464a	add test case for #4929 (#4936 )	2023-05-18 09:12:03 +02:00
bogdankostic	df46e7fadd	fix: Use `AutoTokenizer` instead of DPR specific tokenizer (#4898 ) * Use AutoTokenizer instead of DPR specific tokenizer * Adapt TableTextRetriever * Adapt tests * Adapt tests	2023-05-17 18:54:34 +02:00
Stefano Fiorucci	6e0000732d	feat: add BLIP support in `TransformersImageToText` (#4912 ) * add blip support * fix typo Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-05-16 10:57:41 +02:00
bogdankostic	5b2ef2afd6	Revert "refactor!: Deprecate `name` param in `PromptTemplate` and introduce `template_name` instead (#4810 )" (#4834 ) This reverts commit f660f41c0615e6b3064ef3e321f1e5a295fafc1b.	2023-05-08 11:31:04 +02:00
ZanSara	6e982e9283	fix: preserve `root_node` in `JoinNode`'s output (#4820 ) * preserve root_node and add tests * Added if statement to fix failing tests --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>	2023-05-08 10:17:36 +02:00
bogdankostic	f660f41c06	refactor!: Deprecate `name` param in `PromptTemplate` and introduce `template_name` instead (#4810 ) * Deprecate name parameter * Adapt existing tests and uses of PromptTemplate * Move parameter `name` to end * Adapt existing tests * lg update --------- Co-authored-by: Darja Fokina <daria.f93@gmail.com>	2023-05-08 10:12:29 +02:00
Pouyan	75ff768c21	Pouyanpi/feat/search engine/providers/google api (#4722 ) * feat: implement google api search engine provider Signed-off-by: Pouyan <prezakhanipr@gmail.com> --------- Signed-off-by: Pouyan <prezakhanipr@gmail.com>	2023-05-02 17:09:17 +02:00
Mayank Jobanputra	dcf3ddddff	Added deprecation tests for seq2seq generator and RAG Generator (#4782 )	2023-05-02 13:30:22 +05:30
Mayank Jobanputra	896eb6a2ea	chore: fixed reader loading test for hf-hub starting 0.14.0 (#4607 ) * fixed test base for hub 0.13.3 * check if test succeed from branch * 2nd check if test succeed from branch * removed dependency changes --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-05-02 08:22:44 +02:00
Vladimir Blagojevic	dcaf3002f1	fix: SentenceTransformersRanker's predict_batch returns wrong number of documents (#4756 ) * Fix SentenceTransformersRanker spredict_batch returning wrong number of documents * Julian's feedback	2023-04-27 15:24:39 +02:00
Vladimir Blagojevic	aebc22d27e	Upgrade transformers to 4.28.1 (#4665 ) * Upgrade to transformers 4.28.1 * Commenting out failing piece of test * trailing-whitespace * Adjust regex for error match - it changed between releases * Remove RAG tests failing with transformers update	2023-04-27 12:55:21 +02:00
ZanSara	1b57b96210	refactor!: extract `elasticsearch` (#4668 ) * extract elasticsearch * update pyproject.toml * make more import optional * move MockBaseRetriever in conftest * install es in the es integration tests	2023-04-26 10:14:20 +02:00
Sebastian	8d9136bad4	feat: Implementation of Table Cell Proposal (#4616 ) * Starting adding support for TableCell * Update tests to use row and col * Added schema test to check to_dict and from_dict works for Table documents. Also updated Doc.__eq__ to work for tables. * Update eval test to use TableCell * Added more schema tests for table docs, labels and answers. * Add boolean to toggle between Span and TableCell * Add deprecation message * Test that table answers work as responses in the rest API --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-04-19 13:14:49 +02:00
Sebastian	8c4176bdb2	feat: More flexible routing for RouteDocuments node (#4690 ) * Added warning messages for documents that are skipped by RouteDocuments. Begun adding support for new option return_remaining and List of List support for metadata value splitting. * Simplify _split_by_content_type * Added new unit test and updated _calculate_outgoing_edges * Added some TODOs and turned assert into raising an error. * Update logging messages and make new fixture in tests * Update _split_by_metadata_values to work with return_remaining * Remove unneeded code * Documentation * Add proper support for list of lists * Fix mypy errors * Added assert to make mypy happy * Update haystack/nodes/other/route_documents.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * PR comments * Remove check for logging level * make mypy happy * Update docstring of metadata_values * Removed duplicate check. Make explicit check for metadata_values --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-04-18 15:18:13 +02:00
Fernando Pereira	5d41e60d89	fix: ParsrConverter list element added (#4562 ) * fix: list element and mapping logic around it added to ParsrConverter convert step + unit test covering the specific mapping of list content from Parsr's to Haystack's * Code review changes * changed the samples path after conftest changes * added samples_path to function arg --------- Co-authored-by: Namoush <fmpereira22@gmail.com> Co-authored-by: Fernando Pereira <fernando.pereira@criticalsoftware.com> Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-04-12 18:38:21 +05:30
Ben Heckmann	2d65742443	feat: arbitrary `crawler_depth` for `Crawler` class (#4623 ) * #3674 implemented iterative crawler depth * #3674 added two tests for increased crawler depth * removed old comment	2023-04-11 10:39:17 +02:00
Silvano Cerza	5ac3dffbef	test: Rework conftest (#4614 ) * Split root conftest into multiple ones and remove unused fixtures * Remove some constants and make them fixtures * Remove unnecessary fixture scoping * Fix failing whisper tests * Fix image_file_paths fixture	2023-04-11 10:33:43 +02:00
Silvano Cerza	e85dc79eaa	test: Add pytest fixture to block requests in unit tests (#4433 ) * Add pytest fixture to block requests in unit tests * Mark test correctly as integration * Fix crawler unit test failing cause it tries to install chromedriver	2023-04-06 18:04:57 +02:00
Julian Risch	57415ef8ab	test: Remove duplicate test and edit docstring (#4567 )	2023-03-31 12:39:18 +02:00
Stefano Fiorucci	57f87e24a3	refactor: `OpenAIAnswerGenerator` - avoid tokenizing all documents several times (#4504 )	2023-03-29 22:38:27 +02:00
Zoltan Fedor	32091d66cb	Adding filtering support for Weaviate when used for BM25 querying (#4385 )	2023-03-29 16:51:22 +02:00
Vladimir Blagojevic	7c9f719496	refactor: Adjust WhisperTranscriber to pipeline run methods (#4510 ) * Retrofit WhisperTranscriber run methods * Add pipeline unit test --------- Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>	2023-03-28 13:52:21 +02:00
bogdankostic	ed1837c0c9	feat: Deduplicate duplicate Answers resulting from overlapping Documents in `FARMReader` (#4470 ) * Deduplicate answers resulting from document split overlap * Add tests * Fix Pylint * Adapt existing test * Incorporate PR feedback	2023-03-27 20:04:59 +02:00
Vladimir Blagojevic	be25655663	feat: Add agent tools (#4437 ) * Initial commit, add search_engine * Add TopPSampler * Add more TopPSampler unit tests * Remove SearchEngineSampler (converted to TopPSampler) * Add some basic WebSearch unit tests * Rename unit tests * Add WebRetriever into agent_tools * Adjust to WebRetriever * Add WebRetriever mode [snippet\|document] * Minor changes * SerperDev: add peopleAlsoAsk search results * First agent for hotpotqa * Making WebRetriever work on hotpotqa * refactor: minor WebRetriever improvements (#4377) * refactor: remove doc ids rebuild + antecipate cache * refactor: improve caching, fix Document ids * Minor WebRetriever improvements * Overlooked minor fixes * feat: add Bing API as search engine * refactor: let kwargs pass-through * feat: increase search context * check sampler result, improve batch typing * refactor: increase mypy compliance * Initial commit, add search_engine * Add TopPSampler * Add more TopPSampler unit tests * Remove SearchEngineSampler (converted to TopPSampler) * Add some basic WebSearch unit tests * Rename unit tests * Add WebRetriever into agent_tools * Adjust to WebRetriever * Add WebRetriever mode [snippet\|document] * Minor changes * SerperDev: add peopleAlsoAsk search results * First agent for hotpotqa * Making WebRetriever work on hotpotqa * refactor: minor WebRetriever improvements (#4377) * refactor: remove doc ids rebuild + antecipate cache * refactor: improve caching, fix Document ids * Minor WebRetriever improvements * Overlooked minor fixes * feat: add Bing API as search engine * refactor: let kwargs pass-through * feat: increase search context * check sampler result, improve batch typing * refactor: increase mypy compliance * Fix mypy * Minor example fixes * Fix the descriptions * PR feedback updates * More fixes * TopPSampler: handle top p None value, add unit test * Add top_k to WebSearch * Use boilerpy3 instead trafilatura * Remove date finding * Add more WebRetriever docs * Refactor long methods * making the preprocessor optional * hide WebSearch and make NeuralWebSearch a pipeline * remove unused imports * add WebQAPipeline and split example into two * change example search engine to SerperDev * Turn off progress bars in WebRetriever's PreProcesssor * Agent tool examples - final updates * Add webqa test, search results ranking scores * Better answer box handling for SerperDev and SerpAPI * Minor fixes * pylint * pylint fixes * extract TopPSampler from WebRetriever * use sampler only for WebRetriever modes other than snippet * add web retriever tests * add web retriever tests * exclude rdflib@6.3.2 due to license issues * add test for preprocessed docs and kwargs examples in docstrings * Move test_webqa_pipeline to test/pipelines * change docstring for join_documents_and_scores * Use WebQAPipeline in examples/web_lfqa.py * Use WebQAPipeline in examples/web_lfqa.py * Move test_webqa_pipeline to e2e * Updated lg * Sampler added automatically in WebQAPipeline, no need to add it * Updated lg * Updated lg * :ignore Update agent tools examples to new templates (#4503) * Update examples to new templates * Add print back * fix linting and black format issues --------- Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2023-03-27 18:14:58 +02:00
Silvano Cerza	5b63c2086e	refactor: Deprecate BaseKnowledgeGraph, GraphDBKnowledgeGraph, InMemoryKnowledgeGraph and Text2SparqlRetriever (#4500 ) * Deprecate BaseKnowledgeGraph and InMemoryKnowledgeGraph * Deprecate GraphDBKnowledgeGraph * Fix mypy * Deprecate Text2SparqlRetriever	2023-03-27 15:31:22 +02:00
tstadel	382ca8094e	feat: PromptTemplate extensions (#4378 ) * use outputshapers in prompttemplate * fix pylint * first iteration on regex * implement new promptnode syntax based on f-strings * finish fstring implementation * add additional tests * add security tests * fix mypy * fix pylint * fix test_prompt_templates * fix test_prompt_template_repr * fix test_prompt_node_with_custom_invocation_layer * fix test_invalid_template * more security tests * fix test_complex_pipeline_with_all_features * fix agent tests * refactor get_prompt_template * fix test_prompt_template_syntax_parser * fix test_complex_pipeline_with_all_features * allow functions in comprehensions * break out of fstring test * fix additional tests * mark new tests as unit tests * fix agents tests * convert missing templates * proper use of get_prompt_template * refactor and add docstrings * fix tests * fix pylint * fix agents test * fix tests * refactor globals * make allowed functions configurable via env variable * better dummy variable * fix special alias * don't replace special char variables * more special chars, better docstrings * cherrypick fix audio tests * fix test * rework shapers * fix pylint * fix tests * add new templates * add reference parsing * add more shaper tests * add tests for join and to_string * fix pylint * fix pylint * fix pylint for real * auto fill shaper function params * fix reference parsing for multiple references * fix output variable inference * consolidate qa prompt template output and make shaper work per-document * fix types after merge * introduce output_parser * fix tests * better docstring * rename RegexAnswerParser to AnswerParser * better docstrings	2023-03-27 12:14:11 +02:00
Silvano Cerza	1b5df55dbb	Skip flaky test (#4444 )	2023-03-16 16:32:28 +01:00
Silvano Cerza	3591fc02e1	Mark Crawler tests correctly (#4435 )	2023-03-16 09:26:19 +01:00
Vladimir Blagojevic	2538b4cbc9	Make promptnode test unit (#4420 )	2023-03-15 22:17:23 +01:00
Silvano Cerza	b59cf76093	refactor: Remove AnswerToSpeech and DocumentToSpeech nodes (#4391 ) * Remove AnswerToSpeech and DocumentToSpeech nodes * Remove unused dataclasses * Remove unnecessary dependencies * Remove unused error class and imports	2023-03-15 19:31:13 +01:00
Vladimir Blagojevic	f13501309e	OpenAI streaming support (#4397 )	2023-03-15 18:24:47 +01:00
Silvano Cerza	b3a659cd4a	test: Fix audio tests failing (#4418 ) * Fix audio tests failing * Disable local whisper tests	2023-03-15 15:26:30 +01:00
Vladimir Blagojevic	98256ecf57	Add Whisper node (#4335 ) * Add Whisper node * Add support for audio path, improve tests * Add docs * Improve tests	2023-03-13 16:17:07 +01:00
Daniel Bichuetti	28724e2e25	feat: add automatic OCR detection mechanism and improve performance (#4329 ) * feat: add automatic OCR detection mechanism and improve performance * refactor: add error message * refactor: ignore pdftoppm bad typing * refactor: add Tesseract install. docstrings * fix: check if OCR var. assigned on mp * tests: add path to windows/linux tests * tests: add tessdata path * tests: include matrix ref. * tests: custom Tesseract matrix install * refactor: improve user guide * tests: fix macos path * tests: remove brew formulae version * fix: macos paths * tests: fix macos path * tests: add Tesseract to Windows Path * tests: pytesseract path * tests: macos path * refactor: fix path message and remove extra path from tests * refactor: raise exception when path not found * refactor: expression simplification Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * refactor: check ocr parameter * tests: mark as integration * tests: mock deprecation warning * refactor: simplify code Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * refactor: change deprecation test Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * refactor: add unit patch * refactor: black formatting --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2023-03-13 20:19:22 +05:30
ZanSara	fd3f3143d4	feat: `LanguageClassifier` (#2994 ) * add lanaguage classifier node * Fix a few bugs and general code style * whitespace * first draft and refactoring * draft of classes separation * improve base class * fix inivisible character; add some tests * fix and more tests * more docs and tests * move __init__ to base * add transformers node; improve tests * incorporate feedback; little fix to other node * labels_to_languages mapping * better docstrings * use logger instead of logging --------- Co-authored-by: Stanislav Zamecnik <stanislav.zamecnik@telekom.com> Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> Co-authored-by: stazam <zamecnik.stanislav@gmail.com>	2023-03-13 10:30:03 +01:00
Stefano Fiorucci	444a3116c4	docs: `TransformersImageToText`- inform about supported models, better exception handling (#4310 ) * better docs, exception handling and tests * Update lg * fix little error --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-03-09 15:35:17 +01:00
Mayank Jobanputra	39a20c37fd	fix: hf-tiny-roberta model loading from disk and mypy errors (#4363 ) * Fix mypy failures * Fix try 1 hf model on windows * Fix try 2 hf model on windows --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2023-03-09 18:06:09 +05:30
ZanSara	024332f98f	refactor: simplify registration of `PromptModelInvocationLayer` (#4339 ) * use __init_subclass__ and remove registering functions	2023-03-07 20:53:48 +01:00
Sebastian	7d5e7c089c	refactor: Use TableQuestionAnsweringPipeline from transformers (#4303 ) * Added changes from table-qa-pipeline * Moved classes around to make diff to main look nicer. * Cleaned things up. Removed option to return_no_answer (not needed), added docs and added integration marks. * Remove unneeded code * Added fix for test * Add check for document_ids in answer * Prevent passing of empty list to np.mean * Batching doesn't work with TableQAPipeline b/c of HF issue * Cleanup of table reader tests, added check for document ids. * Fixing pylint * More pylint * PR comments --------- Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-03-07 11:46:50 +01:00
Vladimir Blagojevic	348e7d2dfe	refactor: Separate PromptModelInvocationLayers in providers.py (#4327 ) * Refactor PromptNode, separate PromptModelInvocationLayers in providers.py	2023-03-06 16:34:59 +01:00
Daniel Bichuetti	1548c5ba0f	feat: Add Azure OpenAI embeddings support (#4332 ) * feate: add Azure OpenAI as embedding option * feat: Add Azure OpenAI embeddings support * refactor: check api key * refactor: better type checking for Azure * refactor: enable parallelism + separate and update tests * refactor: string reformat * refactor: explicit typing * refactor: update refs and remove unused code	2023-03-06 13:37:20 +01:00

1 2 3 4

181 Commits