Sara Zan
101d2bc86c
feat: MultiModalRetriever
( #2891 )
...
* Adding Data2VecVision and Data2VecText to the supported models and adapt Tokenizers accordingly
* content_types
* Splitting classes into respective folders
* small changes
* Fix EOF
* eof
* black
* API
* EOF
* whitespace
* api
* improve multimodal similarity processor
* tokenizer -> feature extractor
* Making feature vectors come out of the feature extractor in the similarity head
* embed_queries is now self-sufficient
* couple trivial errors
* Implemented separate language model classes for multimodal inference
* Document embedding seems to work
* removing batch_encode_plus, is deprecated anyway
* Realized the base Data2Vec models are not trained on retrieval tasks
* Issue with the generated embeddings
* Add batching
* Try to fit CLIP in
* Stub of CLIP integration
* Retrieval goes through but returns noise only
* Still working on the scores
* Introduce temporary adapter for CLIP models
* Image retrieval now works with sentence-transformers
* Tidying up the code
* Refactoring is now functional
* Add MPNet to the supported sentence transformers models
* Remove unused classes
* pylint
* docs
* docs
* Remove the method renaming
* mpyp first pass
* docs
* tutorial
* schema
* mypy
* Move devices setup into get_model
* more mypy
* mypy
* pylint
* Move a few params in HaystackModel's init
* make feature extractor work with squadprocessor
* fix feature_extractor_kwargs forwarding
* Forgotten part of the fix
* Revert unrelated ES change
* Revert unrelated memdocstore changes
* comment
* Small corrections
* mypy and pylint
* mypy
* typo
* mypy
* Refactor the call
* mypy
* Do not make FARMReader use the new FeatureExtractor
* mypy
* Detach DPR tests from FeatureExtractor too
* Detach processor tests too
* Add end2end marker
* extract end2end feature extractor tests
* temporary disable feature extraction tests
* Introduce end2end tests for tokenizer tests
* pylint
* Fix model loading from folder in FeatureExtractor
* working o n end2end
* end2end keeps failing
* Restructuring retriever tests
* Restructuring retriever tests
* remove covert_dataset_to_dataloader
* remove comment
* Better check sentence-transformers models
* Use embed_meta_fields properly
* rename passage into document
* Embedding dims can't be found
* Add check for models that support it
* pylint
* Split all retriever tests into suites, running mostly on InMemory only
* fix mypy
* fix tfidf test
* fix weaviate tests
* Parallelize on every docstore
* Fix schema and specify modality in base retriever suite
* tests
* Add first image tests
* remove comment
* Revert to simpler tests
* Update docs/_src/api/api/primitives.md
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/__init__.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* get_args
* mypy
* Update haystack/modeling/model/multimodal/__init__.py
* Update haystack/modeling/model/multimodal/base.py
* Update haystack/modeling/model/multimodal/base.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/sentence_transformers.py
* Update haystack/modeling/model/multimodal/sentence_transformers.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/transformers.py
* Update haystack/modeling/model/multimodal/transformers.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/transformers.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/nodes/retriever/multimodal/retriever.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* mypy
* mypy
* removing more ContentTypes
* more contentypes
* pylint
* add to __init__
* revert end2end workflow for now
* missing integration markers
* Update haystack/nodes/retriever/multimodal/embedder.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* review feedback, removing HaystackImageTransformerModel
* review feedback part 2
* mypy & pylint
* mypy
* mypy
* fix multimodal docs also for Pinecone
* add note on internal constants
* Fix pinecone write_documents
* schemas
* keep support for sentence-transformers only
* fix pinecone test
* schemas
* fix pinecone again
* temporarily disable some tests, need to understand if they're still relevant
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-10-17 18:58:35 +02:00
Vladimir Blagojevic
159cd5a666
feat: Add OpenAIEmbeddingEncoder to EmbeddingRetriever ( #3356 )
2022-10-14 15:01:03 +02:00
Vladimir Blagojevic
9582a423a2
fix: ONNX FARMReader model conversion is broken ( #3211 )
2022-09-26 09:18:12 -04:00
tstadel
4fa9d2d8e7
Fix milvus and faiss tests not running ( #3263 )
...
* fix milvus and faiss tests not running
* fix schema manually
* fix test_dpr_embedding test for milvus
* pip freeze on milvus tests
* fix milvus1 tests being executed: fix all_doc_stores order
* Revert "pip freeze on milvus tests"
This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab.
* make infer_required_doc_store more robust
* don't skip tests without docstore requirements
* use markers for docstore tests
2022-09-22 17:46:49 +02:00
Sara Zan
4e45062a00
Simplify language_modeling.py
and tokenization.py
( #2703 )
...
* Simplification of language_model.py and tokenization.py to remove code duplication
Co-authored-by: vblagoje <dovlex@gmail.com>
2022-07-22 16:29:30 +02:00
Patrick Deutschmann
1db3fd0942
Add support for Multi-Hop Dense Retrieval ( #2571 )
...
* Implement MDR
* Adapt conftest to new MDR signature
* Update Documentation & Code Style
* Change signature of queries param in batch methods of MDR like in #2575
* Update Documentation & Code Style
* Rename MultihopDenseRetriever to MultihopEmbeddingRetriever
* Fix filters in retrieve_batch
* Add docstring for MultihopEmbeddingRetriever.__init__
* Update Documentation & Code Style
* Revert forward signature of TextSimilarityHead
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-05 11:31:11 +02:00
bogdankostic
dc48c444d4
Fix loading of tokenizers in DPR ( #2755 )
2022-07-04 18:18:14 +02:00
Aleksander Smywiński-Pohl
642229255f
Use AutoTokenizer by default, to easily adapt to new models and token… ( #1902 )
...
* Use AutoTokenizer by default, to easily adapt to new models and tokenizers
* Add missing AutoTokenizer import
* Apply Black
* Missing import
* Fix DPR tests
* Remove tests on max length
* Update Documentation & Code Style
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-15 13:13:48 +02:00
Sara Zan
54518ac790
[CI Refactoring] Refactor Document
fixtures in tests ( #2577 )
...
* Refactor document fixtures
* Add embedding files
* Update Documentation & Code Style
* Indentation issue
* Update Documentation & Code Style
* Fix type conversion in conftest.py
* Update Documentation & Code Style
* mypy on sql.py
* mypy on crawler.py
* mypy on pinecone.py
* Adapt retriever tests
* Update Documentation & Code Style
* mypy on crawler.py
* Update Documentation & Code Style
* mypy on crawler.py again
* Update Documentation & Code Style
* mypy fix was too rough
* Fix some more tests
* Update Documentation & Code Style
* Skip meaningless test on FilterRetriever
* Make embedding values less specific
* Update Documentation & Code Style
* Use stable IDs in retriever tests that depend on it
* Remove needless fixtures
* docs_with_ids
* Update Documentation & Code Style
* Typo
* Fix retriever tests
* Fix reader tests
* Update Documentation & Code Style
* Workaround #2626
* Update Documentation & Code Style
* Fix label generator tests
* Reorder vectors
* remove print
* Update Documentation & Code Style
* Update Documentation & Code Style
* git tags leftover
* Update Documentation & Code Style
* fix last failing test
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 18:22:48 +02:00
Sara Zan
59608ca474
[CI Refactoring] Workflow refactoring ( #2576 )
...
* Unify CI tests (from #2466 )
* Update Documentation & Code Style
* Change folder names
* Fix markers list
* Remove marker 'slow', replaced with 'integration'
* Soften children check
* Start ES first so it has time to boot while Python is setup
* Run the full workflow
* Try to make pip upgrade on Windows
* Set KG tests as integration
* Update Documentation & Code Style
* typo
* faster pylint
* Make Pylint use the cache
* filter diff files for pylint
* debug pylint statement
* revert pylint changes
* Remove path from asserted log (fails on Windows)
* Skip preprocessor test on Windows
* Tackling Windows specific failures
* Fix pytest command for windows suites
* Remove \ from command
* Move poppler test into integration
* Skip opensearch test on windows
* Add tolerance in reader sas score for Windows
* Another pytorch approx
* Raise time limit for unit tests :(
* Skip poppler test on Windows CI
* Specify to pull with FF only in docs check
* temporarily run the docs check immediately
* Allow merge commit for now
* Try without fetch depth
* Accelerating test
* Accelerating test
* Add repository and ref alongside fetch-depth
* Separate out code&docs check from tests
* Use setup-python cache
* Delete custom action
* Remove the pull step in the docs check, will find a way to run on bot commits
* Add requirements.txt in .github for caching
* Actually install dependencies
* Change deps group for pylint
* Unclear why the requirements.txt is still required :/
* Fix the code check python setup
* Install all deps for pylint
* Make the autoformat check depend on tests and doc updates workflows
* Try installing dependencies in another order
* Try again to install the deps
* quoting the paths
* Ad back the requirements
* Try again to install rest_api and ui
* Change deps group
* Duplicate haystack install line
* See if the cache is the problem
* Disable also in mypy, who knows
* split the install step
* Split install step everywhere
* Revert "Separate out code&docs check from tests"
This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd.
* Add back the action
* Proactive support for audio (see text2speech branch)
* Fix label generator tests
* Remove install of libsndfile1 on win temporarily
* exclude audio tests on win
* install ffmpeg for integration tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-07 09:23:03 +02:00
bogdankostic
61d9429c25
Simplify loading of EmbeddingRetriever
( #2619 )
...
* Infer model format for EmbeddingRetriever automatically
* Update Documentation & Code Style
* Adapt conftest to automatic inference of model_format
* Update Documentation & Code Style
* Fix tests
* Update Documentation & Code Style
* Fix tests
* Adapt tutorials
* Update Documentation & Code Style
* Add test for similarity scores with sentence transformers
* Adapt doc string and warning message
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-02 15:05:29 +02:00
bogdankostic
867695ad0c
Change signature of queries param in batch methods ( #2575 )
...
* Change signature of queries param in batch methods
* Update Documentation & Code Style
* Fix mypy
* Remove unused import
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 12:33:45 +02:00
Sara Zan
ff4303c51b
[CI refactoring] Categorize tests into folders ( #2554 )
...
* Categorize tests into folders
* Fix linux_ci.yml and an import
* Wrong path
2022-05-17 09:55:53 +01:00