haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-10-20 04:18:57 +00:00

Author	SHA1	Message	Date
Massimiliano Pippi	e1ec4e5e4d	refact!: Remove symbols under the `haystack.document_stores` namespace (#6714 ) * remove symbols under the haystack.document_stores namespace * Update haystack/document_stores/types/protocol.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * fix * same for retrievers * leftovers * more leftovers * add relnote * leftovers * one more * fix examples --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2024-01-10 21:20:42 +01:00
Silvano Cerza	9445b2d466	Fix skipif with empty env var (#6704 )	2024-01-08 19:19:14 +01:00
Silvano Cerza	607e7d1488	Skip integration tests if env var is missing (#6703 )	2024-01-08 17:15:10 +01:00
Vladimir Blagojevic	4d08be0c2a	feat: Update OpenAI Python Client in Haystack 2.x (#6584 ) * Update openai python client * Add release note * Consolidate multiple mock_chat_completion into one * Ensure all components have api_base_url, organization params * Update tests * Enable function calling * Oversight * Minor fixes, add streaming test mocks * Apply suggestions from code review Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * metadata -> meta --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2023-12-21 16:21:24 +01:00
Silvano Cerza	8a513f3b8c	test: Add fixture to block requests in tests (#6585 ) * Add fixture to block requests in tests * Mark tests making requests as integration	2023-12-21 08:51:54 +01:00
Vladimir Blagojevic	628e8aa3d4	feat: Improve getting started examples (#6510 ) * Improve rag and indexing pipelines * Update examples * Simplify user interface and code, improve embedder model * Improve default vals for embedder * resolve typing * resolve typing 2 * Fix unit test --------- Co-authored-by: Timo Möller <timo.moeller@deepset.ai>	2023-12-09 19:01:13 +01:00
Vladimir Blagojevic	008a322023	feat: Add Indexing Pipeline (#6424 ) * Add build_indexing_pipeline utils function * Pylint fixes * Move into another package to avoid circular deps * Revert change * Revert haystack/utils/__init__.py change * Add example * Use DocumentStore type, remove typing checks	2023-12-04 16:08:53 +01:00
ZanSara	a38f871dbd	feat: Add RAG pipeline (#6461 ) * add rag pipeline * Update examples/getting_started/rag.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-12-04 15:25:29 +01:00
Massimiliano Pippi	09e7831f60	clean up 1.x code --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2023-11-24 11:47:47 +01:00
ZanSara	adf7e49af3	chore: review `all` extra (#6029 )	2023-10-12 21:50:53 +02:00
Christian Clauss	91ab90a256	perf: Python performance improvements with ruff C4 and PERF fixes (#5803 ) * Python performance improvements with ruff C4 and PERF * pre-commit fixes * Revert changes to examples/basic_qa_pipeline.py * Revert changes to haystack/preview/testing/document_store.py * revert releasenotes * Upgrade to ruff v0.0.290	2023-09-16 16:26:07 +02:00
Christian Clauss	1bc03ddc73	ci: Fix all ruff pyflakes errors except unused imports (#5820 ) * ci: Fix all ruff pyflakes errors except unused imports * Delete releasenotes/notes/fix-some-pyflakes-errors-69a1106efa5d0203.yaml	2023-09-15 18:30:33 +02:00
Christian Clauss	9405eb90ee	ci: Fix invalid escape sequences in Python code (#5802 ) * ci: Use ruff in pre-commit to further limit complexity * Fix invalid escape sequences in Python code * Delete releasenotes/notes/ruff-4d2504d362035166.yaml	2023-09-14 16:42:48 +02:00
Stefano Fiorucci	d860a5c604	make tests more robust (#5747 )	2023-09-08 15:50:56 +02:00
Sebastian Husch Lee	2bc7fe1a08	test: reactivate unit tests in `test_eval.py` (#5255 ) * Activate tests that follow unit test and integration test rules * Adding more integration labels * Change name to better reflect complexity of test * Remove mark integration tags, move test to doc store test for add_eval_data * Removing incorrect integration label * Deactivated document store test b/c it fails for Weaviate and pinecone * Remove unit label since test needs to be refactored to be considered a unit test * Undo changes * Undo change * Check every field in the load evaluation result * Add back label and add skip reason * Use pytest skip instead of TODO	2023-07-24 17:07:45 +02:00
bogdankostic	0697f5c63e	fix: Support isolated node eval in run_batch in Generators (#5291 ) * Add isolated node eval to BaseGenerator's run_batch * Add unit tests	2023-07-07 10:32:43 +02:00
bogdankostic	ed1bad1155	fix: Use `add_isolated_node_eval` of `eval_batch` in `run_batch` (#5223 ) * Fix isolated node eval in eval_batch * Add unit test	2023-06-28 16:51:23 +02:00
ZanSara	3c71f0ae3d	chore: mark some unit tests under `test/pipeline` (#5124 ) * mark some unit tests as such * remove marker	2023-06-12 17:58:31 +02:00
Massimiliano Pippi	4974bf7ab3	chore: remove deprecated MilvusDocumentStore (#4951 ) * remove deprecated MilvusDocumentStore * remove leftovers * fix pylint	2023-05-19 16:37:38 +02:00
tstadel	7625829684	fix: `EvaluationResult` serialization changes dataframes (#4906 ) * fix nan and index values * add test * make test for None values after evalresult read explicit	2023-05-16 16:03:09 +02:00
duffn	479092e3c1	bug: (rest_api) remove full logging of overwritten env variables (#4791 ) * bug: (rest_api) remove logging of overwritten env variables * Update haystack/pipelines/config.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Update test --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-05-02 16:48:19 +02:00
Vladimir Blagojevic	aebc22d27e	Upgrade transformers to 4.28.1 (#4665 ) * Upgrade to transformers 4.28.1 * Commenting out failing piece of test * trailing-whitespace * Adjust regex for error match - it changed between releases * Remove RAG tests failing with transformers update	2023-04-27 12:55:21 +02:00
tstadel	9cbe9e0949	fix: recursion of death while loading PromptTemplate from yaml (#4691 ) * fix recursion of death when deserializing prompttemplate * add test * set api_key * fix test * add generic test * work in feedback on tests --------- Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-04-26 13:56:51 +02:00
Sebastian	8d9136bad4	feat: Implementation of Table Cell Proposal (#4616 ) * Starting adding support for TableCell * Update tests to use row and col * Added schema test to check to_dict and from_dict works for Table documents. Also updated Doc.__eq__ to work for tables. * Update eval test to use TableCell * Added more schema tests for table docs, labels and answers. * Add boolean to toggle between Span and TableCell * Add deprecation message * Test that table answers work as responses in the rest API --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-04-19 13:14:49 +02:00
Silvano Cerza	5ac3dffbef	test: Rework conftest (#4614 ) * Split root conftest into multiple ones and remove unused fixtures * Remove some constants and make them fixtures * Remove unnecessary fixture scoping * Fix failing whisper tests * Fix image_file_paths fixture	2023-04-11 10:33:43 +02:00
Silvano Cerza	cfb8dfd470	Fix pipeline config and agent tools hashing for telemetry (#4508 )	2023-03-28 09:41:50 +02:00
Vladimir Blagojevic	be25655663	feat: Add agent tools (#4437 ) * Initial commit, add search_engine * Add TopPSampler * Add more TopPSampler unit tests * Remove SearchEngineSampler (converted to TopPSampler) * Add some basic WebSearch unit tests * Rename unit tests * Add WebRetriever into agent_tools * Adjust to WebRetriever * Add WebRetriever mode [snippet\|document] * Minor changes * SerperDev: add peopleAlsoAsk search results * First agent for hotpotqa * Making WebRetriever work on hotpotqa * refactor: minor WebRetriever improvements (#4377) * refactor: remove doc ids rebuild + antecipate cache * refactor: improve caching, fix Document ids * Minor WebRetriever improvements * Overlooked minor fixes * feat: add Bing API as search engine * refactor: let kwargs pass-through * feat: increase search context * check sampler result, improve batch typing * refactor: increase mypy compliance * Initial commit, add search_engine * Add TopPSampler * Add more TopPSampler unit tests * Remove SearchEngineSampler (converted to TopPSampler) * Add some basic WebSearch unit tests * Rename unit tests * Add WebRetriever into agent_tools * Adjust to WebRetriever * Add WebRetriever mode [snippet\|document] * Minor changes * SerperDev: add peopleAlsoAsk search results * First agent for hotpotqa * Making WebRetriever work on hotpotqa * refactor: minor WebRetriever improvements (#4377) * refactor: remove doc ids rebuild + antecipate cache * refactor: improve caching, fix Document ids * Minor WebRetriever improvements * Overlooked minor fixes * feat: add Bing API as search engine * refactor: let kwargs pass-through * feat: increase search context * check sampler result, improve batch typing * refactor: increase mypy compliance * Fix mypy * Minor example fixes * Fix the descriptions * PR feedback updates * More fixes * TopPSampler: handle top p None value, add unit test * Add top_k to WebSearch * Use boilerpy3 instead trafilatura * Remove date finding * Add more WebRetriever docs * Refactor long methods * making the preprocessor optional * hide WebSearch and make NeuralWebSearch a pipeline * remove unused imports * add WebQAPipeline and split example into two * change example search engine to SerperDev * Turn off progress bars in WebRetriever's PreProcesssor * Agent tool examples - final updates * Add webqa test, search results ranking scores * Better answer box handling for SerperDev and SerpAPI * Minor fixes * pylint * pylint fixes * extract TopPSampler from WebRetriever * use sampler only for WebRetriever modes other than snippet * add web retriever tests * add web retriever tests * exclude rdflib@6.3.2 due to license issues * add test for preprocessed docs and kwargs examples in docstrings * Move test_webqa_pipeline to test/pipelines * change docstring for join_documents_and_scores * Use WebQAPipeline in examples/web_lfqa.py * Use WebQAPipeline in examples/web_lfqa.py * Move test_webqa_pipeline to e2e * Updated lg * Sampler added automatically in WebQAPipeline, no need to add it * Updated lg * Updated lg * :ignore Update agent tools examples to new templates (#4503) * Update examples to new templates * Add print back * fix linting and black format issues --------- Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2023-03-27 18:14:58 +02:00
tstadel	4f90e59796	feat: expose prompts to Answer and EvaluationResult (#4341 ) * store prompt in Answer * store prompt in eval csv * fix tests * chore: fix context offset loadingQ * add tests * add test from PR #4476 * fix tests after merge	2023-03-27 17:54:20 +02:00
ZanSara	6d578ebf3d	refactor: remove telemetry v1 (#4496 ) * remove telemetry v1 * more pipeline methods to take out * send_event_2 * mypy * pylint * mypy * mypy again * remove test	2023-03-27 17:38:43 +02:00
ju-gu	a3409c7da6	fix: issue evaluation check for content type (#4181 ) * fix: issue evaluation check for content type Evaluation currently breaks, when the content type is not a str. * add black * add test table eval * add black formatting * Expand integration test --------- Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>	2023-03-16 17:36:53 +01:00
Zoltan Fedor	4dea9db01e	feat: Report execution time for pipeline components in `_debug` (#4197 ) * Adding execution time to the debug output of pipeline components * Linting issue fix * [EMPTY] Re-trigger CI * fixed test --------- Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2023-03-07 04:45:31 +05:30
tstadel	19311119db	fix: EvalResult load migration (#4289 ) * fix evalresult load migration * handle none values correctly * better None check * improve logic and add test	2023-03-06 20:05:02 +01:00
ZanSara	c802305ccf	test: move tests on standard pipelines in `e2e/` (#4309 ) * move out standard pipelines e2e * fixing unit tests * add test data * feedback * pylint * black	2023-03-06 17:26:19 +01:00
ZanSara	165a0a5faa	test: mock all `Translator` tests and move one to `e2e` (#4290 ) * mock all translator tests and move one to e2e * typo * extract pipeline tests using translator * remove duplicate test * move generator test in e2e * Update e2e/pipelines/test_extractive_qa.py * pytest.mark.unit * black * remove model name as well * remove unused fixture * rename original and improve pipeline tests * fixes * pylint	2023-03-01 14:52:05 +01:00
Stefano Fiorucci	e8f9b1b65d	test: replace `ElasticsearchDS` with `InMemoryDS` when it makes sense; support `scale_score` in `InMemoryDS` (#4283 ) * replace elasticds with imds - first draft * fix * fix tests and implement scale_score in imds bm25 * add docstrings for scale_score	2023-03-01 11:35:10 +01:00
ZanSara	13c4ff1b52	refactor: remove direct logging without a logger (#4253 ) * remove direct logging without a logger * add custom pylint checker * add test * pylint * improve checker message * mypy * remove test * add checker for basicConfig * more logging missed * ignore basicConfig * move out logger * move out statement * remove logging configuration	2023-02-23 20:42:42 +01:00
Stefano Fiorucci	5e85f33bd3	refactor: Remove deprecated nodes `EvalDocuments` and `EvalAnswers` (#4194 ) * remove deprecated classed and update test * remove deprecated classed and update test * remove unused code * remove unused import * remove empty evaluator node * unused import :-) * move sas to metrics	2023-02-23 15:26:17 +01:00
ZanSara	f816efa50c	feat: reduce and focus telemetry (#4087 ) * simplified telemetry and docker containers detection * pylint * mypy * mypy * Add new credentials and metadata * remove prints * mypy * remove comment * simplify inout len measurement * black * removed old telemetry, to revert * reintroduce env function * reintroduce old telemetry * fix telemetry selection * telemetry for promptnode * telemetry for some training methods * telemetry for eval and distillation * mypy & pylint * review * Update lg * mypy * improve docstrings * pylint * mypy * fix test * linting * remove old tests --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-02-22 19:02:47 +01:00
bogdankostic	05950719ba	fix: Deduplicate same Documents in isolated evaluation of Reader (#4114 ) * Deduplicate same Documents in one MultiLabel * Add tests * Update label * Update label * Update test * Update test * Revert change to check CI * Revert reversion * Use deepcopy * Update tests	2023-02-10 13:55:14 +01:00
Silvano Cerza	274746db07	style: Update black (#4101 ) * Update black version * Format file with new black style * Update black pre-commit hook version	2023-02-08 15:34:43 +01:00
tstadel	92c58cfda1	feat: Support multiple document_ids in Answer object (for generative QA) (#4062 ) * initial version without shapers * set document_ids for BaseGenerator * introduce question-answering-with-references template * better prompt * make PromptTemplate control output_variable * update schema * fix add_doc_meta_data_to_answer * Revert "fix add_doc_meta_data_to_answer" This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9. * fix add_doc_meta_data_to_answer * fix eval * fix pylint * fix pinecone * fix other tests * fix test * fix flaky test * Revert "fix flaky test" This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775. * adjust docstrings * make Label loading backward-compatible * fix Label backward compatibility for pinecone * fix Label backward compatibility for search engines * fix Label backward compatibility for deepset Cloud * fix tests * fix None issue * fix test_write_feedback * add tests for legacy label support * add document_id test for pinecone * reduce unnecessary contents * add comment to pinecone test	2023-02-08 08:37:22 +01:00
tstadel	9611b64ec5	fix: document retrieval metrics for non-document_id document_relevance_criteria (#3885 ) * fix document retrieval metrics for all document_relevance_criteria * fix tests * fix eval_batch metrics * small refactorings * evaluate metrics on label level * document retrieval tests added * fix pylint * fix test * support file retrieval * add comment about threshold * rename test	2023-02-02 15:00:07 +01:00
Zoltan Fedor	e447bd728a	feat: adding the ability to use Ray Serve async functionality (#3769 ) * Adding the ability to call the Ray pipeline from concurrent apps with async This is to fix #2968 * Fixes: mype + pylint (`invalid-overridden-method`) * Simplifying - no real need for an `AsyncRayPipeline` anymore * Moving the new `run_async` method to the `RayPipeline` * Cleanup * [EMPTY] Re-trigger CI	2023-01-23 16:23:09 +01:00
ZanSara	6f5a2fb1da	fix: remove string validation in YAML (#3854 ) * remove string validation in YAML * unused import * fix import * remove tests * fix tests	2023-01-19 10:06:53 +01:00
Massimiliano Pippi	fa4404baa0	fix: ignore non-serializable params when hashing pipeline objects (#3842 ) * ignore non-serializable params when hashing pipeline objects * make tests more clear	2023-01-11 17:11:41 +01:00
Julian Risch	a2c160e7d8	bug: skip empty documents in reader (#3773 ) * skip empty documents * test eval_batch and account for tables	2023-01-03 15:50:14 +01:00
Julian Risch	b155297a06	feat: change PipelineConfigError to DocumentStoreError with more details (#3783 )	2023-01-02 19:40:45 +01:00
bogdankostic	594d2a10f8	fix: Fix `predict_batch` in `TransformersReader` for single nested Document list (#3748 ) * Fix restoring of list structure * Add tests	2022-12-29 11:48:18 +01:00
Vladimir Blagojevic	e4c3817d01	Adjust get_type() method for pipelines (#3657 )	2022-12-02 14:48:47 +01:00
Julian Risch	adb580b6b7	feat: add offsets_in_context to evaluation result (#3640 ) * add offsets_in_context to eval result * extend test case	2022-11-30 11:43:42 +01:00

1 2

91 Commits