haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-08-09 09:08:19 +00:00

Author	SHA1	Message	Date
Sebastian	a67ca289db	refactor: Update schema objects to handle Dataframes in to_{dict,json} and from_{dict,json} (#4747 ) * Adding support for table Documents when serializing Labels in Haystack * Fix table label equality test * Add serialization support and __eq__ support for table answers * Made convenience functions for converting dataframes. Added some TODOs. Epxanded schema tests for table labels. Updated Multilabel to not convert Dataframes into strings. * get Answer and Label to_json working with DataFrame * Fix from_dict method of Label * Use Dict and remove unneccessary if check * Using pydantic instead of builtins for type detection * Update haystack/schema.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Update haystack/schema.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Update haystack/schema.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Separated table label equivalency tests and added pytest.mark.unit * Added unit test for _dict_factory * Using more descriptive variable names * Adding json files to test to_json and from_json functions * Added sample files for tests --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-05-03 09:42:07 +02:00
ZanSara	a9ec954c45	bug: fix filtering in `MemoryDocumentStore` (v2) (#4768 ) * fix filtering bug * pylint * improve asserts	2023-05-03 09:33:12 +02:00
Pouyan	75ff768c21	Pouyanpi/feat/search engine/providers/google api (#4722 ) * feat: implement google api search engine provider Signed-off-by: Pouyan <prezakhanipr@gmail.com> --------- Signed-off-by: Pouyan <prezakhanipr@gmail.com>	2023-05-02 17:09:17 +02:00
duffn	479092e3c1	bug: (rest_api) remove full logging of overwritten env variables (#4791 ) * bug: (rest_api) remove logging of overwritten env variables * Update haystack/pipelines/config.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Update test --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-05-02 16:48:19 +02:00
Vladimir Blagojevic	1e9f4c1d50	feat: Add HF local runtime token streaming support (#4652 ) * Add HF local runtime token streaming support * Add stream and stream_handler as model kwargs * Improve HF streaming unit tests	2023-05-02 12:50:20 +02:00
Mayank Jobanputra	dcf3ddddff	Added deprecation tests for seq2seq generator and RAG Generator (#4782 )	2023-05-02 13:30:22 +05:30
Mayank Jobanputra	896eb6a2ea	chore: fixed reader loading test for hf-hub starting 0.14.0 (#4607 ) * fixed test base for hub 0.13.3 * check if test succeed from branch * 2nd check if test succeed from branch * removed dependency changes --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-05-02 08:22:44 +02:00
ZanSara	b60d9a2cbf	test: move several modeling tests in e2e/ (#4308 ) * no dpr test seems worth mocking * move distillation tests * pylint * mypy * pylint * move feature_extraction tests as well * move feature_extraction tests as well * merge feature extractor suites * get_language_model tests and adaptive model tests * duplicate test * moving fixtures * mypy * mypy-again * trigger * un-mock integration test * review feedback * feedback * pylint	2023-04-28 17:08:41 +02:00
Vladimir Blagojevic	dcaf3002f1	fix: SentenceTransformersRanker's predict_batch returns wrong number of documents (#4756 ) * Fix SentenceTransformersRanker spredict_batch returning wrong number of documents * Julian's feedback	2023-04-27 15:24:39 +02:00
Vladimir Blagojevic	c9a415ec8d	refactor: Make agent test more robust (#4767 ) * Add more examplars to lower test failure rate * Easier agent run test, more robust, consistently passing	2023-04-27 14:53:15 +02:00
Vladimir Blagojevic	aebc22d27e	Upgrade transformers to 4.28.1 (#4665 ) * Upgrade to transformers 4.28.1 * Commenting out failing piece of test * trailing-whitespace * Adjust regex for error match - it changed between releases * Remove RAG tests failing with transformers update	2023-04-27 12:55:21 +02:00
bogdankostic	c7a20d68d2	fix: Add separate query method for OpenSearchDocumentStore (#4764 ) * Add separate query method for OpenSearchDocumentStore * Convert integration test to unit test + add separate tests for OpenSearch	2023-04-26 21:58:33 +02:00
Vladimir Blagojevic	41b6e33f64	Enhance the error logging in PromptTemplate variable resolution (#4730 ) * Enhance the error logging in PromptTemplate variable resolution * Revert change Daria made * Silvano PR feedback	2023-04-26 18:09:20 +02:00
tstadel	9cbe9e0949	fix: recursion of death while loading PromptTemplate from yaml (#4691 ) * fix recursion of death when deserializing prompttemplate * add test * set api_key * fix test * add generic test * work in feedback on tests --------- Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-04-26 13:56:51 +02:00
s_teja	d033a086d0	fix: loads local HF Models in PromptNode pipeline (#4670 ) * bug: fix load local HF Models in PromptNode pipeline * Update hugging_face.py remove duplicate validator * update: black formatted * update: update doc string, replace pop with get * test HFLocalInvocationLayer with local model	2023-04-26 13:10:02 +02:00
ZanSara	1b57b96210	refactor!: extract `elasticsearch` (#4668 ) * extract elasticsearch * update pyproject.toml * make more import optional * move MockBaseRetriever in conftest * install es in the es integration tests	2023-04-26 10:14:20 +02:00
Sebastian	8d9136bad4	feat: Implementation of Table Cell Proposal (#4616 ) * Starting adding support for TableCell * Update tests to use row and col * Added schema test to check to_dict and from_dict works for Table documents. Also updated Doc.__eq__ to work for tables. * Update eval test to use TableCell * Added more schema tests for table docs, labels and answers. * Add boolean to toggle between Span and TableCell * Add deprecation message * Test that table answers work as responses in the rest API --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-04-19 13:14:49 +02:00
Silvano Cerza	f13cc751c3	Block requests_cache in unit tests (#4696 )	2023-04-18 16:15:26 +02:00
Massimiliano Pippi	0c081f19e2	fix: remove warnings from the more recent Elasticsearch client (#4602 ) * clean up the ES instance in a more robust way * do not sleep, refresh the index instead * remove client warnings * fix unit tests * fix opensearch compatibility * fix unit tests * update ES version * bump elasticsearch-py * adjust docs * use recreate_index param * use same fixture strategy for Opensearch * Update lg --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-04-18 15:40:17 +02:00
Sebastian	8c4176bdb2	feat: More flexible routing for RouteDocuments node (#4690 ) * Added warning messages for documents that are skipped by RouteDocuments. Begun adding support for new option return_remaining and List of List support for metadata value splitting. * Simplify _split_by_content_type * Added new unit test and updated _calculate_outgoing_edges * Added some TODOs and turned assert into raising an error. * Update logging messages and make new fixture in tests * Update _split_by_metadata_values to work with return_remaining * Remove unneeded code * Documentation * Add proper support for list of lists * Fix mypy errors * Added assert to make mypy happy * Update haystack/nodes/other/route_documents.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * PR comments * Remove check for logging level * make mypy happy * Update docstring of metadata_values * Removed duplicate check. Make explicit check for metadata_values --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-04-18 15:18:13 +02:00
ZanSara	b06821b311	refactor: `node->component` (#4687 ) * node->component * fix tests	2023-04-17 12:20:42 +02:00
Massimiliano Pippi	a03e8335aa	Ignore cross-reference properties when loading documents (#4664 ) * drop cross-reference properties * be more defensive * fix regression	2023-04-17 10:40:30 +02:00
Silvano Cerza	79727ed31f	Add requests blocker fixture (#4671 )	2023-04-14 18:01:30 +02:00
Vladimir Blagojevic	1dcac11133	feat: Add Hugging Face inferencing PromptNode layer (#4641 )	2023-04-14 17:59:17 +02:00
Vladimir Blagojevic	1dd6158244	fix: Add model_max_length model_kwargs parameter to HF PromptNode (#4651 )	2023-04-14 15:40:42 +02:00
ZanSara	174d80ab41	skip tests (#4654 )	2023-04-13 17:56:51 +02:00
Vladimir Blagojevic	e30bc8fe5a	feat: Add GenerationConfig option to PromptNode's HuggingFace invocation layer (#4649 )	2023-04-13 12:15:00 +02:00
ZanSara	f2106ab37b	feat: initial implementation of `MemoryDocumentStore` for new Pipelines (#4447 ) * add stub implementation * reimplementation * test files * docstore tests * tests for document * better testing * remove mmh3 * readme * only store, no retrieval yet * linting * review feedback * initial filters implementation * working on filters * linters * filtering works and is isolated by document store * simplify filters * comments * improve filters matching code * review feedback * pylint * move logic into_create_id * mypy	2023-04-13 09:36:23 +02:00
ZanSara	ba11d1c2a8	refactor!: extract evaluation and statistical dependencies (#4457 ) * try-catch sklearn and scipy * haystack imports * linting * mypy * try to import baseretriever * remove typing * unused import * remove more typing * pylint * isolate sql imports for postgres, which we don't use anyway * remove stats * replace expit * als inmemory * mypy * feedback * docker * expit * re-add njit	2023-04-12 15:38:56 +02:00
Fernando Pereira	5d41e60d89	fix: ParsrConverter list element added (#4562 ) * fix: list element and mapping logic around it added to ParsrConverter convert step + unit test covering the specific mapping of list content from Parsr's to Haystack's * Code review changes * changed the samples path after conftest changes * added samples_path to function arg --------- Co-authored-by: Namoush <fmpereira22@gmail.com> Co-authored-by: Fernando Pereira <fernando.pereira@criticalsoftware.com> Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-04-12 18:38:21 +05:30
Silvano Cerza	5baf2f5930	refactor: Rework invocation layers (#4615 ) * Move invocation layers into separate package * Fix circular imports * Fix import	2023-04-11 11:04:29 +02:00
Ben Heckmann	2d65742443	feat: arbitrary `crawler_depth` for `Crawler` class (#4623 ) * #3674 implemented iterative crawler depth * #3674 added two tests for increased crawler depth * removed old comment	2023-04-11 10:39:17 +02:00
Silvano Cerza	5547e85bd5	feat: Add util method to make HTTP requests with configurable retry (#4627 ) * Add util method to make HTTP requests with configurable retry * Fix pylint * Remove unnecessary optional parameter	2023-04-11 10:35:39 +02:00
Silvano Cerza	5ac3dffbef	test: Rework conftest (#4614 ) * Split root conftest into multiple ones and remove unused fixtures * Remove some constants and make them fixtures * Remove unnecessary fixture scoping * Fix failing whisper tests * Fix image_file_paths fixture	2023-04-11 10:33:43 +02:00
Silvano Cerza	e85dc79eaa	test: Add pytest fixture to block requests in unit tests (#4433 ) * Add pytest fixture to block requests in unit tests * Mark test correctly as integration * Fix crawler unit test failing cause it tries to install chromedriver	2023-04-06 18:04:57 +02:00
Silvano Cerza	c3abf73332	refactor: Rework prompt tests (#4600 ) * Rework some PromptNode and PromptModel tests * Remove duplicate code in PromptNode * Fix mypy * Fix test cause of missing fixture * Revert "Fix mypy" This reverts commit e530295a06cb260d9a8bd89679534958cb3d9776. * Revert "Remove duplicate code in PromptNode" This reverts commit 4a678ae81504dcc78a737372c061d12dc8799639.	2023-04-06 14:47:44 +02:00
Vladimir Blagojevic	a8d283cfac	Fix HF stop words (single stop word) (#4584 )	2023-04-04 14:45:10 +02:00
Silvano Cerza	1cc4c9c651	refactor: Refactor prompt node (#4580 ) * Refactor prompt structure * Refactor prompt tests structure * Fix pylint * Move TestPromptTemplateSyntax to test_prompt_template.py	2023-04-03 11:49:49 +02:00
Silvano Cerza	af02803cce	Skip flaky prompt node integration test (#4572 )	2023-04-03 09:49:30 +02:00
Julian Risch	57415ef8ab	test: Remove duplicate test and edit docstring (#4567 )	2023-03-31 12:39:18 +02:00
Agnieszka Marzec	815dcdebbd	docs: Update PromptNode API docs (#4549 ) * Update docstrings * adapt test to changed logging message --------- Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2023-03-30 14:27:44 +02:00
Stefano Fiorucci	57f87e24a3	refactor: `OpenAIAnswerGenerator` - avoid tokenizing all documents several times (#4504 )	2023-03-29 22:38:27 +02:00
Zoltan Fedor	32091d66cb	Adding filtering support for Weaviate when used for BM25 querying (#4385 )	2023-03-29 16:51:22 +02:00
Vladimir Blagojevic	7c9f719496	refactor: Adjust WhisperTranscriber to pipeline run methods (#4510 ) * Retrofit WhisperTranscriber run methods * Add pipeline unit test --------- Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>	2023-03-28 13:52:21 +02:00
Silvano Cerza	cfb8dfd470	Fix pipeline config and agent tools hashing for telemetry (#4508 )	2023-03-28 09:41:50 +02:00
bogdankostic	ed1837c0c9	feat: Deduplicate duplicate Answers resulting from overlapping Documents in `FARMReader` (#4470 ) * Deduplicate answers resulting from document split overlap * Add tests * Fix Pylint * Adapt existing test * Incorporate PR feedback	2023-03-27 20:04:59 +02:00
Vladimir Blagojevic	be25655663	feat: Add agent tools (#4437 ) * Initial commit, add search_engine * Add TopPSampler * Add more TopPSampler unit tests * Remove SearchEngineSampler (converted to TopPSampler) * Add some basic WebSearch unit tests * Rename unit tests * Add WebRetriever into agent_tools * Adjust to WebRetriever * Add WebRetriever mode [snippet\|document] * Minor changes * SerperDev: add peopleAlsoAsk search results * First agent for hotpotqa * Making WebRetriever work on hotpotqa * refactor: minor WebRetriever improvements (#4377) * refactor: remove doc ids rebuild + antecipate cache * refactor: improve caching, fix Document ids * Minor WebRetriever improvements * Overlooked minor fixes * feat: add Bing API as search engine * refactor: let kwargs pass-through * feat: increase search context * check sampler result, improve batch typing * refactor: increase mypy compliance * Initial commit, add search_engine * Add TopPSampler * Add more TopPSampler unit tests * Remove SearchEngineSampler (converted to TopPSampler) * Add some basic WebSearch unit tests * Rename unit tests * Add WebRetriever into agent_tools * Adjust to WebRetriever * Add WebRetriever mode [snippet\|document] * Minor changes * SerperDev: add peopleAlsoAsk search results * First agent for hotpotqa * Making WebRetriever work on hotpotqa * refactor: minor WebRetriever improvements (#4377) * refactor: remove doc ids rebuild + antecipate cache * refactor: improve caching, fix Document ids * Minor WebRetriever improvements * Overlooked minor fixes * feat: add Bing API as search engine * refactor: let kwargs pass-through * feat: increase search context * check sampler result, improve batch typing * refactor: increase mypy compliance * Fix mypy * Minor example fixes * Fix the descriptions * PR feedback updates * More fixes * TopPSampler: handle top p None value, add unit test * Add top_k to WebSearch * Use boilerpy3 instead trafilatura * Remove date finding * Add more WebRetriever docs * Refactor long methods * making the preprocessor optional * hide WebSearch and make NeuralWebSearch a pipeline * remove unused imports * add WebQAPipeline and split example into two * change example search engine to SerperDev * Turn off progress bars in WebRetriever's PreProcesssor * Agent tool examples - final updates * Add webqa test, search results ranking scores * Better answer box handling for SerperDev and SerpAPI * Minor fixes * pylint * pylint fixes * extract TopPSampler from WebRetriever * use sampler only for WebRetriever modes other than snippet * add web retriever tests * add web retriever tests * exclude rdflib@6.3.2 due to license issues * add test for preprocessed docs and kwargs examples in docstrings * Move test_webqa_pipeline to test/pipelines * change docstring for join_documents_and_scores * Use WebQAPipeline in examples/web_lfqa.py * Use WebQAPipeline in examples/web_lfqa.py * Move test_webqa_pipeline to e2e * Updated lg * Sampler added automatically in WebQAPipeline, no need to add it * Updated lg * Updated lg * :ignore Update agent tools examples to new templates (#4503) * Update examples to new templates * Add print back * fix linting and black format issues --------- Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2023-03-27 18:14:58 +02:00
tstadel	4f90e59796	feat: expose prompts to Answer and EvaluationResult (#4341 ) * store prompt in Answer * store prompt in eval csv * fix tests * chore: fix context offset loadingQ * add tests * add test from PR #4476 * fix tests after merge	2023-03-27 17:54:20 +02:00
ZanSara	6d578ebf3d	refactor: remove telemetry v1 (#4496 ) * remove telemetry v1 * more pipeline methods to take out * send_event_2 * mypy * pylint * mypy * mypy again * remove test	2023-03-27 17:38:43 +02:00
Silvano Cerza	3b5223fa1c	refactor: Mark MilvusDocumentStore as deprecated (#4498 ) * Mark MilvusDocumentStore as deprecated * Fix mypy	2023-03-27 15:31:48 +02:00

... 9 10 11 12 13 ...

1204 Commits