Sara Zan
101d2bc86c
feat: MultiModalRetriever ( #2891 )
...
* Adding Data2VecVision and Data2VecText to the supported models and adapt Tokenizers accordingly
* content_types
* Splitting classes into respective folders
* small changes
* Fix EOF
* eof
* black
* API
* EOF
* whitespace
* api
* improve multimodal similarity processor
* tokenizer -> feature extractor
* Making feature vectors come out of the feature extractor in the similarity head
* embed_queries is now self-sufficient
* couple trivial errors
* Implemented separate language model classes for multimodal inference
* Document embedding seems to work
* removing batch_encode_plus, is deprecated anyway
* Realized the base Data2Vec models are not trained on retrieval tasks
* Issue with the generated embeddings
* Add batching
* Try to fit CLIP in
* Stub of CLIP integration
* Retrieval goes through but returns noise only
* Still working on the scores
* Introduce temporary adapter for CLIP models
* Image retrieval now works with sentence-transformers
* Tidying up the code
* Refactoring is now functional
* Add MPNet to the supported sentence transformers models
* Remove unused classes
* pylint
* docs
* docs
* Remove the method renaming
* mpyp first pass
* docs
* tutorial
* schema
* mypy
* Move devices setup into get_model
* more mypy
* mypy
* pylint
* Move a few params in HaystackModel's init
* make feature extractor work with squadprocessor
* fix feature_extractor_kwargs forwarding
* Forgotten part of the fix
* Revert unrelated ES change
* Revert unrelated memdocstore changes
* comment
* Small corrections
* mypy and pylint
* mypy
* typo
* mypy
* Refactor the call
* mypy
* Do not make FARMReader use the new FeatureExtractor
* mypy
* Detach DPR tests from FeatureExtractor too
* Detach processor tests too
* Add end2end marker
* extract end2end feature extractor tests
* temporary disable feature extraction tests
* Introduce end2end tests for tokenizer tests
* pylint
* Fix model loading from folder in FeatureExtractor
* working o n end2end
* end2end keeps failing
* Restructuring retriever tests
* Restructuring retriever tests
* remove covert_dataset_to_dataloader
* remove comment
* Better check sentence-transformers models
* Use embed_meta_fields properly
* rename passage into document
* Embedding dims can't be found
* Add check for models that support it
* pylint
* Split all retriever tests into suites, running mostly on InMemory only
* fix mypy
* fix tfidf test
* fix weaviate tests
* Parallelize on every docstore
* Fix schema and specify modality in base retriever suite
* tests
* Add first image tests
* remove comment
* Revert to simpler tests
* Update docs/_src/api/api/primitives.md
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/__init__.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* get_args
* mypy
* Update haystack/modeling/model/multimodal/__init__.py
* Update haystack/modeling/model/multimodal/base.py
* Update haystack/modeling/model/multimodal/base.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/sentence_transformers.py
* Update haystack/modeling/model/multimodal/sentence_transformers.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/transformers.py
* Update haystack/modeling/model/multimodal/transformers.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/transformers.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/nodes/retriever/multimodal/retriever.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* mypy
* mypy
* removing more ContentTypes
* more contentypes
* pylint
* add to __init__
* revert end2end workflow for now
* missing integration markers
* Update haystack/nodes/retriever/multimodal/embedder.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* review feedback, removing HaystackImageTransformerModel
* review feedback part 2
* mypy & pylint
* mypy
* mypy
* fix multimodal docs also for Pinecone
* add note on internal constants
* Fix pinecone write_documents
* schemas
* keep support for sentence-transformers only
* fix pinecone test
* schemas
* fix pinecone again
* temporarily disable some tests, need to understand if they're still relevant
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-10-17 18:58:35 +02:00
Sebastian
ca2a1e1792
fix: Update how schema is ordered ( #3399 )
...
* Use builtin sort_keys option for json.dump
* Order anyof_list which is causing issues
2022-10-17 17:09:32 +02:00
Sara Zan
50f34372e1
fix: stable YAML schema generation ( #3388 )
...
* add key sorting in schema generation
* add pre-commit hook
* try pre-commit hook
* Fixed schemas
* trying a simpler version
* pylint
* ordered dict
* reverting to dict
* unused import
* remove hook
2022-10-14 18:36:47 +02:00
Massimiliano Pippi
7d0f89b6f5
fix: demo won't start through Docker compose ( #3337 )
...
* use new Docker images and add a health check for ES
* try
* silence streamlit errors
* remove CMD override
* final touches
* leftover
* make pylint happy
2022-10-14 18:16:20 +02:00
Vladimir Blagojevic
159cd5a666
feat: Add OpenAIEmbeddingEncoder to EmbeddingRetriever ( #3356 )
2022-10-14 15:01:03 +02:00
Vladimir Blagojevic
5ebe3cb33d
fix: QuestionGenerator generates wrong document questions for non-default num_queries_per_doc parameter ( #3381 )
2022-10-14 12:08:30 +02:00
Stefano Fiorucci
7290196c32
fix: allow same vector_id in different indexes for SQL-based Document stores ( #3383 )
...
* fix_multiple_indexes
* improve test names
2022-10-14 09:55:56 +02:00
tstadel
ba30971d8d
feat: extract label aggregation ( #3363 )
...
* extract label aggregation
* refactoring
* reformat
* add missing param docstrings
* fix comment
2022-10-13 19:09:14 +02:00
Massimiliano Pippi
3b0f00a615
[CI] Use VERSION.txt to sync with Readme ( #3367 )
...
* use VERSION.txt to sync with Readme
* add docs
* force workflow run
* unrelated change
* Revert "force workflow run"
This reverts commit f0aea59afa57c96f374073465629f893031f727a.
* make the steps mutually exclusive
2022-10-13 18:39:23 +02:00
Branden Chan
37bd61a48e
Create minor_version_release.yml ( #3338 )
...
* Create minor_version_release.yml
* Incorporate reviewer feedback
2022-10-13 14:32:31 +02:00
Stefano Fiorucci
60f678e120
refactor: remove dead code from FAISSDocumentStore ( #3372 )
2022-10-13 13:23:01 +02:00
Massimiliano Pippi
31fa75e9fd
feat: add support for Elasticsearch 7.16.2 ( #3318 )
...
* bump elastic to 7.16.2+
* decouple Elasticsearch and Opensearch
use method override instead of func variables
fix mypy
default value
fix broken tests
update schema
* relax version pin
* rename the base class
* rename module
* fix import order
* do not run the new tests in the old job
* remove outdated TODO
2022-10-13 11:53:27 +02:00
Sebastian
75641dd024
fix: Added checks for DataParallel and WrappedDataParallel ( #3366 )
...
* Added checks for DataParallel and WrappedDataParallel
* Update isinstance checks according to pylint recommendation
* Using isinstance over types
* Added test for dpr training
2022-10-13 08:05:56 +02:00
Massimiliano Pippi
db6e5754cd
add deprecation notice to old dockerfiles ( #3317 )
2022-10-11 16:10:57 +02:00
hsm207
c2537dfc28
Update weaviate schema doc link ( #3351 )
...
* Update weaviate schema doc
* Update haystack/document_stores/weaviate.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-10-11 15:30:20 +02:00
Massimiliano Pippi
8ddb6d7821
feat: add multi-platform Docker images ( #3354 )
...
* add arm platform to the build
* add a note about multi-platforms build
* test on current branch
* setup qemu on Github actions
* better naming
* Revert "test on current branch"
This reverts commit b0e5ea77b46e3e0bafd579c95e434c6a3c8ef84f.
2022-10-11 12:29:33 +02:00
Malte Pietsch
fb02b61e90
Update README.md ( #3247 )
2022-10-11 10:43:17 +02:00
tstadel
7fe5003c97
fix: eval() with add_isolated_node_eval=True breaks if no node supports it ( #3347 )
...
* fix isolated eval for pipelines without a node supporting isolated mode
* reformat
* add test
2022-10-10 20:48:13 +02:00
bogdankostic
84aff5e2b3
fix: Allow less restrictive values for parameters in Pipeline configurations ( #3345 )
...
* fix: Allow arbitrary values for parameters in Pipeline configurations
* Add test
* Adapt expected error message in tests
* Fix bug
* Fix bug on checking JSON
* Remove test cases that previously tested if error was thrown
* Change encoding in test
* Restrict possible values
* Re-add tests
* Re-add tests
* Add value flag to list elements
2022-10-10 13:08:45 +02:00
JacdDev
797c20c966
feat: Adding filters param to MostSimilarDocumentsPipeline run and run_batch ( #3301 )
...
* Adding filters param to MostSimilarDocumentsPipeline run and run_batch
* Adding index param to MostSimilarDocumentsPipeline run and run_batch
* Adding index param documentation to MostSimilarDocumentsPipeline run and run_batch
* Updated index param documentation to MostSimilarDocumentsPipeline run and run_batch. Updated type: ignore in run_batch
* Adding filters param to MostSimilarDocumentsPipeline run and run_batch
* Adding index param to MostSimilarDocumentsPipeline run and run_batch
* Adding index param documentation to MostSimilarDocumentsPipeline run and run_batch
* Updated index param documentation to MostSimilarDocumentsPipeline run and run_batch. Updated type: ignore in run_batch
2022-10-10 10:22:14 +02:00
tstadel
b84a6b1716
fix: opensearch script score with filters ( #3321 )
...
* fix opensearch script score filters
* add comment
* add integration test
* update schema
2022-10-06 15:41:29 +02:00
Vladimir Blagojevic
6cb4e93965
refactor: remove Inferencer multiprocessing ( #3283 )
2022-10-04 14:08:23 +02:00
Massimiliano Pippi
b49bce97aa
remove test step ( #3278 )
2022-10-04 11:34:43 +02:00
nickchomey
e6767fccef
bugfix for TranslationWrapperPipeline ( #3290 )
...
* bugfix for TranslationWrapperPipeline
* Update standard_pipelines.py
* Update haystack/pipelines/standard_pipelines.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-10-04 09:44:48 +02:00
Jeff Risberg
ad8fbe56ee
bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node ( #3170 )
...
* don't send the list of inputs back as an output in the running of a node.
* updated documentation
* Update pydoc-markdown.py
* added test case for pipeline join fix
Co-authored-by: JeffRisberg <jrisberg@aol.com>
2022-09-30 13:27:17 +02:00
Stefano Fiorucci
e2e6887ee8
Improve TransformersDocumentClassifier tests ( #3270 )
2022-09-27 13:25:34 +02:00
Taner Topal
24d4591307
docs: Fix a docstring in ray.py
2022-09-27 09:05:04 +02:00
Vladimir Blagojevic
9582a423a2
fix: ONNX FARMReader model conversion is broken ( #3211 )
2022-09-26 09:18:12 -04:00
Stefano Fiorucci
b579b9d54a
bug: make ElasticSearchDocumentStore use batch_size in get_documents_by_id ( #3166 )
...
* use batch_size
* try to fix git mess
* improve docstrings
* fix
2022-09-26 13:21:59 +02:00
Vladimir Blagojevic
9ca3ccae98
fix:MostSimilarDocumentsPipeline doesn't have pipeline property ( #3265 )
...
* Add comments and a unit test
* More unit tests for MostSimilarDocumentsPipeline
2022-09-23 09:46:48 -04:00
Vladimir Blagojevic
eba7cf51b1
chore: Remove Update API documentation hook ( #3271 )
...
* Remove Update API documentation hook
* Remove .github/utils/pydoc-markdown.py file
2022-09-23 08:54:08 -04:00
tstadel
05a86b9d3d
feat: FAISS in OpenSearch: Support HNSW for cosine ( #3217 )
...
* support cosine similiarity with faiss
* update docs
* update api docs
* fix tests
* Revert "update api docs"
This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.
* fix api docs
* collapse test
* rename similairity to space_type mappings
* only normalize for faiss
* fix merge
* fix docs normalization
* get rid of List[np.array]
* update docs
* fix tests and tutorials
* fix mypy
* fix mypy
* fix mypy again
* again mypy
* blacken
* update tutorial 4 docs
* fix embeddingretriever
* fix faiss
* move dense specific logic to DenseRetriever
* fix mypy
* cosine tests for all documents stores
* fix pinecone
* add docstring
* docstring corrections
* update docs
* add integration test marker
* docstrings update
* update docs
* fix typo
* update docs
* fix MockDenseRetriever
* run integration tests for all documentstores
* fix test_update_embeddings_cosine_similarity
* fix faiss tests not running
* blacken
* make test_cosine_sanity_check integration test
* split PR
* update docs
* manually revert tutorial doc change
* Fix embedding type
* set integration marker correctly
* make BaseDocumentStore.normalize_embedding static
* format
* fix handling of opensearch_faiss param
* fix merge
* add DenseRetriever typing
* organize imports in conftest.py
* organize imports in conftest.py (2)
* fix DenseRetriever import
* add opensearch-tests-linux
2022-09-23 13:26:49 +02:00
tstadel
4fa9d2d8e7
Fix milvus and faiss tests not running ( #3263 )
...
* fix milvus and faiss tests not running
* fix schema manually
* fix test_dpr_embedding test for milvus
* pip freeze on milvus tests
* fix milvus1 tests being executed: fix all_doc_stores order
* Revert "pip freeze on milvus tests"
This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab.
* make infer_required_doc_store more robust
* don't skip tests without docstore requirements
* use markers for docstore tests
2022-09-22 17:46:49 +02:00
Massimiliano Pippi
2b803a265b
run checks on release branches ( #3267 )
2022-09-22 16:25:34 +02:00
Vladimir Blagojevic
820742cac7
Fix schema for 1.10.x ( #3269 )
2022-09-22 15:20:51 +02:00
tstadel
b10e2c392e
chore: add DenseRetriever abstraction ( #3252 )
...
* support cosine similiarity with faiss
* update docs
* update api docs
* fix tests
* Revert "update api docs"
This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.
* fix api docs
* collapse test
* rename similairity to space_type mappings
* only normalize for faiss
* fix merge
* fix docs normalization
* get rid of List[np.array]
* update docs
* fix tests and tutorials
* fix mypy
* fix mypy
* fix mypy again
* again mypy
* blacken
* update tutorial 4 docs
* fix embeddingretriever
* fix faiss
* move dense specific logic to DenseRetriever
* fix mypy
* cosine tests for all documents stores
* fix pinecone
* add docstring
* docstring corrections
* update docs
* add integration test marker
* docstrings update
* update docs
* fix typo
* update docs
* fix MockDenseRetriever
* run integration tests for all documentstores
* fix test_update_embeddings_cosine_similarity
* fix faiss tests not running
* blacken
* make test_cosine_sanity_check integration test
* update docs
* fix imports
* import DenseRetriever normally
* update docs
* fix deepcopy of documents
* update schema
* Revert "update schema"
This reverts commit 83cf8f323648468e1c322d54852bec084d637e3f.
* fix schema for ci manually
2022-09-21 19:08:54 +02:00
Branden Chan
492a8046d8
docs: sync Haystack API with Readme ( #3223 )
...
* First pass at syncing Haystack API with Readme
* Reapply changes
* Regularize slugs
* Regularize slugs
* Regularize slugs
* Set category id and regen
* Trigger workflow
* Delete old md files
* Test sync
* Undo test string
* Incorporate reviewer feedback
* Test on the fly API generation and sync
* Test on the fly API generation and sync
* Test on the fly API generation and sync
* Test on the fly API generation and sync
* Test on the fly API generation and sync
* Change name of pydoc-markdown scripts
* Test on the fly API generation and sync
* Remove version tag
* Test version tag
* Test version tag
* Test version tag
* Revert test docstring
* Revert md file changes
* Revert md file changes
* Revert script naming
* Test on the fly generation and sync
* Adjust for on the fly generation and sync
* Revert test string
* Remove old documentation workflow
* Set workflow to work on main
* Change readme version name
2022-09-21 17:18:34 +02:00
Massimiliano Pippi
8f76d64f6f
chore: bump release number for unstable version ( #3251 )
...
* bump version for unstable
* allow generation of rc schemas
* update schemas
2022-09-21 16:58:06 +02:00
Vladimir Blagojevic
938e6fda5b
Classify pipeline's type based on its components ( #3132 )
...
* Add pipeline get_type mehod
* Add pipeline uptime
* Add pipeline telemetry event sending
* Send pipeline telemetry once a day (at most)
* Add pipeline invocation counter, change invocation counter logic
* Update allowed telemetry parameters - allow pipeline parameters
* PR review: add unit test
2022-09-21 14:53:42 +02:00
Stefano Fiorucci
89247b804c
refactor: make TransformersDocumentClassifier output consistent between different types of classification ( #3224 )
...
* make output consistent
* make output consistent
* added tests for details
* better tests
* Update test_document_classifier.py
* make black happy
* Update test_document_classifier.py
* Update test_document_classifier.py
2022-09-21 13:16:03 +02:00
Massimiliano Pippi
15bb6c2ea2
remove tutorials from the repo ( #3244 )
2022-09-20 18:32:45 +02:00
Tuana Celik
336c144e72
chore: updating colab links in older docs versions ( #3250 )
...
* updating colab links to tutorial 1
* remaining tutorials
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-09-20 18:15:29 +02:00
Vladimir Blagojevic
fe31896fcb
Proper retrieval of answers for batch eval ( #3245 )
...
* Proper retrieval of answers and documents for batch eval
2022-09-20 08:16:03 -04:00
Malte Pietsch
7e79a48540
bug: reactivate benchmarks with quick fixes ( #2766 )
...
* quick fix benchmark runs to make them work with current haystack version
* fix minor typo
* update readme. fix minor things to make benchmarks run again
* Update Documentation & Code Style
* fix typo in readme
* update result files for reader and retriever querying
* reduce batch size for update embeddings to prevent xlarge bulk_update requests that exceed elastic's limits (happening in dense 500k runs)
* change default memory allocation back to normal. add note to readme
* add first indexing results
* add memory to docker cmd
* full benchmarks results on commit c5a2651fcbbeffca06ffa9036b10e62669bcc1b0
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-09-20 10:22:08 +02:00
Massimiliano Pippi
9399ddf949
fix pydoc-markdown hook ( #3238 )
2022-09-19 18:20:35 +02:00
Sara Zan
dcb132ba59
chore: remove f-strings from logs for performance reasons ( #3212 )
...
* Use the %s syntax on all debug messages
* Use the %s syntax on some more debug messages
* Use the %s syntax on info messages
* Use the %s syntax on warning messages
* Use the %s syntax on error and exception messages
* mypy
* pylint
* trogger tutorials execution in CI
* trigger tutorials execution on CI
* black
* remove embeddings from repr
* fix Document `__repr__`
* address feedback
* mypy
2022-09-19 18:18:32 +02:00
Massimiliano Pippi
8fbccbda82
fix: handle Documents containing dataframes in Multilabel constructor ( #3237 )
...
* format
* fix docs
2022-09-19 14:59:20 +02:00
banjocustard
19af6f4e40
bug: fix pdftotext installation verification ( #3233 )
2022-09-19 11:32:58 +02:00
Massimiliano Pippi
859c303c16
include fontconfig in the final image and fix tagging ( #3230 )
2022-09-16 15:33:24 +02:00
Malte Pietsch
3134b0d679
fix: type of temperature param and adjust defaults for OpenAIAnswerGenerator ( #3073 )
...
* fix: type of temperature param and adjust defaults
* update schema
* update api docs
2022-09-16 14:11:33 +02:00