Grant Williams
1cf70d3dce
build: Upgrade transformers to the latest version 4.34.1 ( #5994 )
...
* Upgrade transformers to the latest version 4.34.0 so that Haystack can support the new Mistral, Nougat, and other models.
* update release notes
* updated missing lazy import
* Update .github workflows imports
* bump more versions in .github workflows
* rever import sorting
* Update to catch runtime errors to match haystack_hub changes
* add language parameter value to whisper test
* bump transformers version in linting preview workflow
* bump transformers version in linting preview workflow
* bump version to v4.34.1
* resolve mypy issue with reused variables
* install openai-whisper without dependencies
* remove audio extra, update whisper install instructions
* remove audio extra, update whisper install instructions
* keep audio extra but add version
* keep audio extra with no constraints
* remove audio extra
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-10-24 19:13:12 +02:00
Christian Clauss
bf6d306d68
ci: Simplify Python code with ruff rules SIM ( #5833 )
...
* ci: Simplify Python code with ruff rules SIM
* Revert #5828
* ruff --select=I --fix haystack/modeling/infer.py
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-09-20 08:32:44 +02:00
Stefano Fiorucci
25d5dedb46
Fix: FARMReader
- Consider the max number of labels/answers during training ( #5197 )
...
* first draft
* improve it a bit
* unit tests
* PR review, improved tests
* PR review, improved tests 2
2023-06-26 10:14:21 +02:00
Julian Risch
8cfeed095d
build: Remove mmh3 dependency ( #4896 )
...
* build: Remove mmh3 dependency
* resolve circular import
* pylint
* make mmh3.py sibling of schema.py
* pylint import order
* pylint
* undo example changes
* increase coverage in modeling module
* increase coverage further
* rename new unit tests
2023-05-17 21:31:08 +02:00
Ben Heckmann
099d0deb86
fix: Dynamic max_answers
for SquadProcessor (fixes IndexError when max_answers is less than the number of answers in the dataset) ( #4817 )
...
* #4320 implemented dynamic max_answers for SquadProcessor, fixed IndexError when max_answers is less than the number of answers in the dataset
* #4320 added two unit tests for dataset_from_dicts testing default and manual max_answers
* apply suggestions from code review
Co-authored-by: bogdankostic <bogdankostic@web.de>
* simplify comment, fix mypy & pylint errors, fix old test
* adjust max_answers to each dataset individually
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-05-15 14:34:23 +02:00
Sebastian
707f1c3546
Add modeling to unit tests so it we can get coverage for that ( #4809 )
...
* Add modeling to unit tests so it we can get coverage for that
* fix unit tests
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-05-08 19:05:21 +02:00
ZanSara
b60d9a2cbf
test: move several modeling tests in e2e/ ( #4308 )
...
* no dpr test seems worth mocking
* move distillation tests
* pylint
* mypy
* pylint
* move feature_extraction tests as well
* move feature_extraction tests as well
* merge feature extractor suites
* get_language_model tests and adaptive model tests
* duplicate test
* moving fixtures
* mypy
* mypy-again
* trigger
* un-mock integration test
* review feedback
* feedback
* pylint
2023-04-28 17:08:41 +02:00
Silvano Cerza
5ac3dffbef
test: Rework conftest ( #4614 )
...
* Split root conftest into multiple ones and remove unused fixtures
* Remove some constants and make them fixtures
* Remove unnecessary fixture scoping
* Fix failing whisper tests
* Fix image_file_paths fixture
2023-04-11 10:33:43 +02:00
Silvano Cerza
274746db07
style: Update black ( #4101 )
...
* Update black version
* Format file with new black style
* Update black pre-commit hook version
2023-02-08 15:34:43 +01:00
Jack Butler
f006eded7d
fix: allow Biadaptive & Triadaptive to work with EarlyStopping ( #4033 )
...
* fix: allow str when saving tri/bi-adaptive models
* fix: make trainer model loading class-agnostic
* test: add test for DPR with EarlyStopping
* refactor: simplify model reloading via classmethod
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-02-03 11:13:18 +01:00
ZanSara
0e471d5e5a
fix: change model in distillation test ( #3944 )
...
* change model
* change layer count
* move promptnode tests in integration
* fix marker
2023-01-25 23:32:11 +05:30
Stefano Fiorucci
f43bc562d3
refactor: replace torch.no_grad
with torch.inference_mode
(where possible) ( #3601 )
...
* try to replace torch.no_grad
* revert erroneous change
* revert other module breaking
* revert training/base
2022-11-23 09:26:11 +01:00
Sara Zan
43b24fd1a7
fix: strip whitespaces safely from FARMReader
's answers ( #3526 )
...
* remove .strip()
* check for right-side offset
* return the whitespace-cleaned answer
* lstrip, not rstrip :D
* remove int
* left_offset
* slightly refactor reader fixture
* extend test_output
2022-11-08 09:26:47 +01:00
Sara Zan
101d2bc86c
feat: MultiModalRetriever
( #2891 )
...
* Adding Data2VecVision and Data2VecText to the supported models and adapt Tokenizers accordingly
* content_types
* Splitting classes into respective folders
* small changes
* Fix EOF
* eof
* black
* API
* EOF
* whitespace
* api
* improve multimodal similarity processor
* tokenizer -> feature extractor
* Making feature vectors come out of the feature extractor in the similarity head
* embed_queries is now self-sufficient
* couple trivial errors
* Implemented separate language model classes for multimodal inference
* Document embedding seems to work
* removing batch_encode_plus, is deprecated anyway
* Realized the base Data2Vec models are not trained on retrieval tasks
* Issue with the generated embeddings
* Add batching
* Try to fit CLIP in
* Stub of CLIP integration
* Retrieval goes through but returns noise only
* Still working on the scores
* Introduce temporary adapter for CLIP models
* Image retrieval now works with sentence-transformers
* Tidying up the code
* Refactoring is now functional
* Add MPNet to the supported sentence transformers models
* Remove unused classes
* pylint
* docs
* docs
* Remove the method renaming
* mpyp first pass
* docs
* tutorial
* schema
* mypy
* Move devices setup into get_model
* more mypy
* mypy
* pylint
* Move a few params in HaystackModel's init
* make feature extractor work with squadprocessor
* fix feature_extractor_kwargs forwarding
* Forgotten part of the fix
* Revert unrelated ES change
* Revert unrelated memdocstore changes
* comment
* Small corrections
* mypy and pylint
* mypy
* typo
* mypy
* Refactor the call
* mypy
* Do not make FARMReader use the new FeatureExtractor
* mypy
* Detach DPR tests from FeatureExtractor too
* Detach processor tests too
* Add end2end marker
* extract end2end feature extractor tests
* temporary disable feature extraction tests
* Introduce end2end tests for tokenizer tests
* pylint
* Fix model loading from folder in FeatureExtractor
* working o n end2end
* end2end keeps failing
* Restructuring retriever tests
* Restructuring retriever tests
* remove covert_dataset_to_dataloader
* remove comment
* Better check sentence-transformers models
* Use embed_meta_fields properly
* rename passage into document
* Embedding dims can't be found
* Add check for models that support it
* pylint
* Split all retriever tests into suites, running mostly on InMemory only
* fix mypy
* fix tfidf test
* fix weaviate tests
* Parallelize on every docstore
* Fix schema and specify modality in base retriever suite
* tests
* Add first image tests
* remove comment
* Revert to simpler tests
* Update docs/_src/api/api/primitives.md
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/__init__.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* get_args
* mypy
* Update haystack/modeling/model/multimodal/__init__.py
* Update haystack/modeling/model/multimodal/base.py
* Update haystack/modeling/model/multimodal/base.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/sentence_transformers.py
* Update haystack/modeling/model/multimodal/sentence_transformers.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/transformers.py
* Update haystack/modeling/model/multimodal/transformers.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/transformers.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/nodes/retriever/multimodal/retriever.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* mypy
* mypy
* removing more ContentTypes
* more contentypes
* pylint
* add to __init__
* revert end2end workflow for now
* missing integration markers
* Update haystack/nodes/retriever/multimodal/embedder.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* review feedback, removing HaystackImageTransformerModel
* review feedback part 2
* mypy & pylint
* mypy
* mypy
* fix multimodal docs also for Pinecone
* add note on internal constants
* Fix pinecone write_documents
* schemas
* keep support for sentence-transformers only
* fix pinecone test
* schemas
* fix pinecone again
* temporarily disable some tests, need to understand if they're still relevant
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-10-17 18:58:35 +02:00
Sebastian
75641dd024
fix: Added checks for DataParallel and WrappedDataParallel ( #3366 )
...
* Added checks for DataParallel and WrappedDataParallel
* Update isinstance checks according to pylint recommendation
* Using isinstance over types
* Added test for dpr training
2022-10-13 08:05:56 +02:00
Vladimir Blagojevic
66f3f42a46
fix: Replace multiprocessing tokenization with batched fast tokenization ( #3089 )
...
* Replace multiprocessing tokenization with batched fast tokenization
* Replace deprecated tokenization method invocations
2022-08-31 07:33:39 -04:00
Sara Zan
4e45062a00
Simplify language_modeling.py
and tokenization.py
( #2703 )
...
* Simplification of language_model.py and tokenization.py to remove code duplication
Co-authored-by: vblagoje <dovlex@gmail.com>
2022-07-22 16:29:30 +02:00
Sara Zan
ff4303c51b
[CI refactoring] Categorize tests into folders ( #2554 )
...
* Categorize tests into folders
* Fix linux_ci.yml and an import
* Wrong path
2022-05-17 09:55:53 +01:00