haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-18 11:04:35 +00:00

Author	SHA1	Message	Date
Grant Williams	1cf70d3dce	build: Upgrade transformers to the latest version 4.34.1 (#5994 ) * Upgrade transformers to the latest version 4.34.0 so that Haystack can support the new Mistral, Nougat, and other models. * update release notes * updated missing lazy import * Update .github workflows imports * bump more versions in .github workflows * rever import sorting * Update to catch runtime errors to match haystack_hub changes * add language parameter value to whisper test * bump transformers version in linting preview workflow * bump transformers version in linting preview workflow * bump version to v4.34.1 * resolve mypy issue with reused variables * install openai-whisper without dependencies * remove audio extra, update whisper install instructions * remove audio extra, update whisper install instructions * keep audio extra but add version * keep audio extra with no constraints * remove audio extra --------- Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2023-10-24 19:13:12 +02:00
Christian Clauss	bf6d306d68	ci: Simplify Python code with ruff rules SIM (#5833 ) * ci: Simplify Python code with ruff rules SIM * Revert #5828 * ruff --select=I --fix haystack/modeling/infer.py --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-09-20 08:32:44 +02:00
Stefano Fiorucci	25d5dedb46	Fix: `FARMReader` - Consider the max number of labels/answers during training (#5197 ) * first draft * improve it a bit * unit tests * PR review, improved tests * PR review, improved tests 2	2023-06-26 10:14:21 +02:00
Julian Risch	8cfeed095d	build: Remove mmh3 dependency (#4896 ) * build: Remove mmh3 dependency * resolve circular import * pylint * make mmh3.py sibling of schema.py * pylint import order * pylint * undo example changes * increase coverage in modeling module * increase coverage further * rename new unit tests	2023-05-17 21:31:08 +02:00
Ben Heckmann	099d0deb86	fix: Dynamic `max_answers` for SquadProcessor (fixes IndexError when max_answers is less than the number of answers in the dataset) (#4817 ) * #4320 implemented dynamic max_answers for SquadProcessor, fixed IndexError when max_answers is less than the number of answers in the dataset * #4320 added two unit tests for dataset_from_dicts testing default and manual max_answers * apply suggestions from code review Co-authored-by: bogdankostic <bogdankostic@web.de> * simplify comment, fix mypy & pylint errors, fix old test * adjust max_answers to each dataset individually --------- Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-05-15 14:34:23 +02:00
Sebastian	707f1c3546	Add modeling to unit tests so it we can get coverage for that (#4809 ) * Add modeling to unit tests so it we can get coverage for that * fix unit tests --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-05-08 19:05:21 +02:00
ZanSara	b60d9a2cbf	test: move several modeling tests in e2e/ (#4308 ) * no dpr test seems worth mocking * move distillation tests * pylint * mypy * pylint * move feature_extraction tests as well * move feature_extraction tests as well * merge feature extractor suites * get_language_model tests and adaptive model tests * duplicate test * moving fixtures * mypy * mypy-again * trigger * un-mock integration test * review feedback * feedback * pylint	2023-04-28 17:08:41 +02:00
Silvano Cerza	5ac3dffbef	test: Rework conftest (#4614 ) * Split root conftest into multiple ones and remove unused fixtures * Remove some constants and make them fixtures * Remove unnecessary fixture scoping * Fix failing whisper tests * Fix image_file_paths fixture	2023-04-11 10:33:43 +02:00
Silvano Cerza	274746db07	style: Update black (#4101 ) * Update black version * Format file with new black style * Update black pre-commit hook version	2023-02-08 15:34:43 +01:00
Jack Butler	f006eded7d	fix: allow Biadaptive & Triadaptive to work with EarlyStopping (#4033 ) * fix: allow str when saving tri/bi-adaptive models * fix: make trainer model loading class-agnostic * test: add test for DPR with EarlyStopping * refactor: simplify model reloading via classmethod --------- Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2023-02-03 11:13:18 +01:00
ZanSara	0e471d5e5a	fix: change model in distillation test (#3944 ) * change model * change layer count * move promptnode tests in integration * fix marker	2023-01-25 23:32:11 +05:30
Stefano Fiorucci	f43bc562d3	refactor: replace `torch.no_grad` with `torch.inference_mode` (where possible) (#3601 ) * try to replace torch.no_grad * revert erroneous change * revert other module breaking * revert training/base	2022-11-23 09:26:11 +01:00
Sara Zan	43b24fd1a7	fix: strip whitespaces safely from `FARMReader`'s answers (#3526 ) * remove .strip() * check for right-side offset * return the whitespace-cleaned answer * lstrip, not rstrip :D * remove int * left_offset * slightly refactor reader fixture * extend test_output	2022-11-08 09:26:47 +01:00
Sara Zan	101d2bc86c	feat: `MultiModalRetriever` (#2891 ) * Adding Data2VecVision and Data2VecText to the supported models and adapt Tokenizers accordingly * content_types * Splitting classes into respective folders * small changes * Fix EOF * eof * black * API * EOF * whitespace * api * improve multimodal similarity processor * tokenizer -> feature extractor * Making feature vectors come out of the feature extractor in the similarity head * embed_queries is now self-sufficient * couple trivial errors * Implemented separate language model classes for multimodal inference * Document embedding seems to work * removing batch_encode_plus, is deprecated anyway * Realized the base Data2Vec models are not trained on retrieval tasks * Issue with the generated embeddings * Add batching * Try to fit CLIP in * Stub of CLIP integration * Retrieval goes through but returns noise only * Still working on the scores * Introduce temporary adapter for CLIP models * Image retrieval now works with sentence-transformers * Tidying up the code * Refactoring is now functional * Add MPNet to the supported sentence transformers models * Remove unused classes * pylint * docs * docs * Remove the method renaming * mpyp first pass * docs * tutorial * schema * mypy * Move devices setup into get_model * more mypy * mypy * pylint * Move a few params in HaystackModel's init * make feature extractor work with squadprocessor * fix feature_extractor_kwargs forwarding * Forgotten part of the fix * Revert unrelated ES change * Revert unrelated memdocstore changes * comment * Small corrections * mypy and pylint * mypy * typo * mypy * Refactor the call * mypy * Do not make FARMReader use the new FeatureExtractor * mypy * Detach DPR tests from FeatureExtractor too * Detach processor tests too * Add end2end marker * extract end2end feature extractor tests * temporary disable feature extraction tests * Introduce end2end tests for tokenizer tests * pylint * Fix model loading from folder in FeatureExtractor * working o n end2end * end2end keeps failing * Restructuring retriever tests * Restructuring retriever tests * remove covert_dataset_to_dataloader * remove comment * Better check sentence-transformers models * Use embed_meta_fields properly * rename passage into document * Embedding dims can't be found * Add check for models that support it * pylint * Split all retriever tests into suites, running mostly on InMemory only * fix mypy * fix tfidf test * fix weaviate tests * Parallelize on every docstore * Fix schema and specify modality in base retriever suite * tests * Add first image tests * remove comment * Revert to simpler tests * Update docs/_src/api/api/primitives.md Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/__init__.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * get_args * mypy * Update haystack/modeling/model/multimodal/__init__.py * Update haystack/modeling/model/multimodal/base.py * Update haystack/modeling/model/multimodal/base.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/sentence_transformers.py * Update haystack/modeling/model/multimodal/sentence_transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/transformers.py * Update haystack/modeling/model/multimodal/transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/nodes/retriever/multimodal/retriever.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * mypy * mypy * removing more ContentTypes * more contentypes * pylint * add to __init__ * revert end2end workflow for now * missing integration markers * Update haystack/nodes/retriever/multimodal/embedder.py Co-authored-by: bogdankostic <bogdankostic@web.de> * review feedback, removing HaystackImageTransformerModel * review feedback part 2 * mypy & pylint * mypy * mypy * fix multimodal docs also for Pinecone * add note on internal constants * Fix pinecone write_documents * schemas * keep support for sentence-transformers only * fix pinecone test * schemas * fix pinecone again * temporarily disable some tests, need to understand if they're still relevant Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-10-17 18:58:35 +02:00
Sebastian	75641dd024	fix: Added checks for DataParallel and WrappedDataParallel (#3366 ) * Added checks for DataParallel and WrappedDataParallel * Update isinstance checks according to pylint recommendation * Using isinstance over types * Added test for dpr training	2022-10-13 08:05:56 +02:00
Vladimir Blagojevic	66f3f42a46	fix: Replace multiprocessing tokenization with batched fast tokenization (#3089 ) * Replace multiprocessing tokenization with batched fast tokenization * Replace deprecated tokenization method invocations	2022-08-31 07:33:39 -04:00
Sara Zan	4e45062a00	Simplify `language_modeling.py` and `tokenization.py` (#2703 ) * Simplification of language_model.py and tokenization.py to remove code duplication Co-authored-by: vblagoje <dovlex@gmail.com>	2022-07-22 16:29:30 +02:00
Sara Zan	ff4303c51b	[CI refactoring] Categorize tests into folders (#2554 ) * Categorize tests into folders * Fix linux_ci.yml and an import * Wrong path	2022-05-17 09:55:53 +01:00

18 Commits