Massimiliano Pippi
a15af7f8c3
refactor: Move InMemoryDocumentStore tests to their own class ( #3614 )
...
* move tests to their own class
* move more tests
* add specific job
* fix test
* Update test/document_stores/test_memory.py
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-23 15:33:46 +01:00
Massimiliano Pippi
2fadcf2859
add labeler to the repo ( #3609 )
2022-11-21 20:49:25 +05:30
Massimiliano Pippi
7e0aa82eb8
Update Python version ( #3602 )
2022-11-21 10:16:47 +01:00
Massimiliano Pippi
1399681c81
move milvus tests to their own module ( #3596 )
2022-11-17 16:22:02 +01:00
Mayank Jobanputra
3098440a27
bug: fix release number ( #3559 )
...
* Added haystack version in docker base build
* test version -- name's bond
2022-11-15 16:31:10 +05:30
Massimiliano Pippi
6a48ace9b9
BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x ( #3552 )
...
* remove milvus1
* leftover
* revert deprecation process
2022-11-15 09:54:55 +01:00
Massimiliano Pippi
057a8c0b4f
refactor: Pinecone tests ( #3555 )
...
* add pytest option to unmock pinecone
* first try
* handle missing answer
* fix labels metadata
* more tests
* adapt workflow
* typo
* address review comments
2022-11-14 15:19:15 +01:00
Massimiliano Pippi
7af22cd98c
CI: install httpx to run tests ( #3565 )
...
* install httpx to run tests
* try
2022-11-14 12:52:04 +01:00
Massimiliano Pippi
4dfddf0d10
refactor: Refactor Weaviate tests ( #3541 )
...
* refactor tests
* fix job
* revert
* revert
* revert
* use latest weaviate
* fix abstract methods signatures
* pass class_name to all the CRUD methods
* finish moving all the tests
* bump weaviate version
* raise, don't pass
2022-11-14 09:57:30 +01:00
Massimiliano Pippi
3319ef6d1c
refactor: refactor FAISS tests ( #3537 )
...
* fix write docs behaviour
* refactor FAISS tests
* do not remove the sqlite db
* try
* remove extra slash
* Apply suggestions from code review
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
* Update test/document_stores/test_faiss.py
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-08 16:37:01 +01:00
Massimiliano Pippi
af96e002a4
merge black job into testing workflow ( #3539 )
2022-11-07 20:01:02 +05:30
Massimiliano Pippi
255072d8d5
refactor: move dC tests to their own module and job ( #3529 )
...
* move dC tests to their own module and job
* restore global var
* revert
2022-11-04 17:05:10 +01:00
Sara Zan
815017ad5b
Deploy the demo only manually ( #3525 )
2022-11-04 12:15:58 +01:00
Massimiliano Pippi
2bb81331b7
feat: add SQLDocumentStore tests ( #3517 )
...
* port SQL tests
* cleanup document_store_tests.py from sql tests
* leftover
* Update .github/workflows/tests.yml
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
* Update test/document_stores/test_base.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-11-04 09:24:19 +01:00
Massimiliano Pippi
0a04dec808
Use a dedicated PAT ( #3511 )
...
* Use a dedicated PAT
* Update project.yml
2022-11-03 12:58:01 +01:00
Sara Zan
b93bbb1cab
refactor: upgrade actions version ( #3506 )
...
* upgrade actions version
* upgrade cache action too
2022-11-02 10:35:10 +01:00
Massimiliano Pippi
9fe2f69d56
add workflow to triage new issues with GH projects ( #3508 )
2022-10-31 16:01:59 +01:00
Massimiliano Pippi
b694c7b5cb
Document Store test refactoring ( #3449 )
...
* add new marker
* start using test hierarchies
* move ES tests into their own class
* refactor test workflow
* job steps
* add more tests
* move more tests
* more tests
* test labels
* add more tests
* Update tests.yml
* Update tests.yml
* fix
* typo
* fix es image tag
* map es ports
* try
* fix
* default port
* remove opensearch from the markers sorcery
* revert
* skip new tests in old jobs
* skip opensearch_faiss
2022-10-31 15:30:14 +01:00
Massimiliano Pippi
17cd79e2c8
[release process] Create new schema when bumping unstable ( #3416 )
...
* also create new schema when bumping unstable version
* openapi schema
* no need to update the json schema anymore
2022-10-31 12:26:48 +01:00
Sara Zan
54cc9cd4cf
refactor: remove json-schemas ( #3485 )
...
* remove json-schemas
* main schema can be removed too
* add .gitignore to schemas folder
* try to explicitly get the new haystack in the rest api tests
* fix workflow again
* fix version string in rest api tests
* add pip freeze
* debug statements in workflow
* -U prevents schema generation
2022-10-31 11:24:43 +01:00
Massimiliano Pippi
9f4a9a76a3
fix: pattern to match tags push ( #3469 )
2022-10-28 14:52:30 +02:00
Sara Zan
f377b78263
refactor: replace YAML schema check with a dispatch call ( #3482 )
...
* Replace yaml check with a dispatch call
* split workflow
* add branch for testing
* access secrets properly
* remove testing branch trigger
2022-10-28 10:48:59 +02:00
Sebastian
59857cb492
feat: Speed up reader tests ( #3476 )
...
* Use a smaller reader where possible
* Change scope to module of reader to get faster load times
2022-10-26 19:04:18 +02:00
Vladimir Blagojevic
5ca96357ff
feat: Add CohereEmbeddingEncoder to EmbeddingRetriever ( #3453 )
2022-10-25 17:52:29 +02:00
Massimiliano Pippi
df4d20d32c
fix the readme version to sync ( #3417 )
2022-10-20 16:50:36 +02:00
Stefano Fiorucci
8c1a34494d
refactor: update package strategy in ui ( #3396 )
...
* update ui package: first try
* update README
* fixes
* update schemas
* restore schemas
* use matrix folder in tests
* fix tests
* fix schemas
* really fix schemas
* don't use matrix folder
* remove blank line
* cleaner pytest command
2022-10-20 12:18:03 +02:00
Sebastian
51d4fe01c3
fix: Update env variable for model caching timeout ( #3405 )
...
* fix: Update env variable for model caching timeout
The environment variable used to set the timeout for the model caching step had a typo in it from the maintainers of `actions/cache@v3`, which is why it has not been working (see comment [here](https://github.com/actions/cache/issues/810#issuecomment-1281895575 )).
* Removed newline
2022-10-18 17:36:25 +02:00
Branden Chan
cf4642a5f8
[CI] Create Github Workflow that creates a new version branch in Haystack and Readme ( #3335 )
...
* Test readme_integration.yml
* Test readme_integration.yml
* Test variables
* Test variables
* Test variables
* Test variables
* Test commit
* Test commit
* Test commit
* Trigger action
* Add v
* Trigger action
* Trigger action
* Trigger action
* Trigger action
* Update API docs headers
* Revert "Update API docs headers"
This reverts commit 34e665063f4de29854befe575a795dbfef04415c.
* Trigger action
* Trigger action
* Trigger action
* Update release
* Update release
* Update release
* Delete File
* Split steps into own files
* Edit action names
* Start making changes
* Start implementing version bump
* Implement minor version release
* Fix github action
* Test action
* Test action
* Test action
* Test action
* Test action
* Change back to main
* Add comments
* Remove line
* Format docstring
* Incorporate reviewer feedback
* Fix variable name
* Print version.txt
* Incorporate Reviewer feedback
* Rename variables for clarity
* Add fetch
* Change branch
* Change branch
* Change branch
* Change branch
* Change branch
* Revert docstring changes
* Incorporate reviewer feedback
* Run black
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2022-10-18 17:09:43 +02:00
Sebastian
93817f63b4
feat: Speed up integration tests (nodes) ( #3408 )
...
* Changed summarizer model to a smaller one (2GB to 500MB) to save on space and speed up the tests.
* Removed google pegasus from cache
2022-10-18 16:23:57 +02:00
Sebastian
15a59fd040
feat: Updated EntityExtractor to handle long texts and added better postprocessing ( #3154 )
...
* Remove dependence on HuggingFace TokenClassificationPipeline and group all postprocessing functions under one class
* Added copyright notice for HF and deepset to entity file to acknowledge that a lot of the postprocessing parts came from the transformers library.
* Fixed text squishing problem. Added additional unit test for it.
Co-authored-by: ju-gu <julian.gutsch@deepset.ai>
2022-10-17 21:26:44 +02:00
Sara Zan
101d2bc86c
feat: MultiModalRetriever ( #2891 )
...
* Adding Data2VecVision and Data2VecText to the supported models and adapt Tokenizers accordingly
* content_types
* Splitting classes into respective folders
* small changes
* Fix EOF
* eof
* black
* API
* EOF
* whitespace
* api
* improve multimodal similarity processor
* tokenizer -> feature extractor
* Making feature vectors come out of the feature extractor in the similarity head
* embed_queries is now self-sufficient
* couple trivial errors
* Implemented separate language model classes for multimodal inference
* Document embedding seems to work
* removing batch_encode_plus, is deprecated anyway
* Realized the base Data2Vec models are not trained on retrieval tasks
* Issue with the generated embeddings
* Add batching
* Try to fit CLIP in
* Stub of CLIP integration
* Retrieval goes through but returns noise only
* Still working on the scores
* Introduce temporary adapter for CLIP models
* Image retrieval now works with sentence-transformers
* Tidying up the code
* Refactoring is now functional
* Add MPNet to the supported sentence transformers models
* Remove unused classes
* pylint
* docs
* docs
* Remove the method renaming
* mpyp first pass
* docs
* tutorial
* schema
* mypy
* Move devices setup into get_model
* more mypy
* mypy
* pylint
* Move a few params in HaystackModel's init
* make feature extractor work with squadprocessor
* fix feature_extractor_kwargs forwarding
* Forgotten part of the fix
* Revert unrelated ES change
* Revert unrelated memdocstore changes
* comment
* Small corrections
* mypy and pylint
* mypy
* typo
* mypy
* Refactor the call
* mypy
* Do not make FARMReader use the new FeatureExtractor
* mypy
* Detach DPR tests from FeatureExtractor too
* Detach processor tests too
* Add end2end marker
* extract end2end feature extractor tests
* temporary disable feature extraction tests
* Introduce end2end tests for tokenizer tests
* pylint
* Fix model loading from folder in FeatureExtractor
* working o n end2end
* end2end keeps failing
* Restructuring retriever tests
* Restructuring retriever tests
* remove covert_dataset_to_dataloader
* remove comment
* Better check sentence-transformers models
* Use embed_meta_fields properly
* rename passage into document
* Embedding dims can't be found
* Add check for models that support it
* pylint
* Split all retriever tests into suites, running mostly on InMemory only
* fix mypy
* fix tfidf test
* fix weaviate tests
* Parallelize on every docstore
* Fix schema and specify modality in base retriever suite
* tests
* Add first image tests
* remove comment
* Revert to simpler tests
* Update docs/_src/api/api/primitives.md
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/__init__.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* get_args
* mypy
* Update haystack/modeling/model/multimodal/__init__.py
* Update haystack/modeling/model/multimodal/base.py
* Update haystack/modeling/model/multimodal/base.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/sentence_transformers.py
* Update haystack/modeling/model/multimodal/sentence_transformers.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/transformers.py
* Update haystack/modeling/model/multimodal/transformers.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/modeling/model/multimodal/transformers.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/nodes/retriever/multimodal/retriever.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* mypy
* mypy
* removing more ContentTypes
* more contentypes
* pylint
* add to __init__
* revert end2end workflow for now
* missing integration markers
* Update haystack/nodes/retriever/multimodal/embedder.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* review feedback, removing HaystackImageTransformerModel
* review feedback part 2
* mypy & pylint
* mypy
* mypy
* fix multimodal docs also for Pinecone
* add note on internal constants
* Fix pinecone write_documents
* schemas
* keep support for sentence-transformers only
* fix pinecone test
* schemas
* fix pinecone again
* temporarily disable some tests, need to understand if they're still relevant
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-10-17 18:58:35 +02:00
Massimiliano Pippi
3b0f00a615
[CI] Use VERSION.txt to sync with Readme ( #3367 )
...
* use VERSION.txt to sync with Readme
* add docs
* force workflow run
* unrelated change
* Revert "force workflow run"
This reverts commit f0aea59afa57c96f374073465629f893031f727a.
* make the steps mutually exclusive
2022-10-13 18:39:23 +02:00
Branden Chan
37bd61a48e
Create minor_version_release.yml ( #3338 )
...
* Create minor_version_release.yml
* Incorporate reviewer feedback
2022-10-13 14:32:31 +02:00
Massimiliano Pippi
31fa75e9fd
feat: add support for Elasticsearch 7.16.2 ( #3318 )
...
* bump elastic to 7.16.2+
* decouple Elasticsearch and Opensearch
use method override instead of func variables
fix mypy
default value
fix broken tests
update schema
* relax version pin
* rename the base class
* rename module
* fix import order
* do not run the new tests in the old job
* remove outdated TODO
2022-10-13 11:53:27 +02:00
Massimiliano Pippi
8ddb6d7821
feat: add multi-platform Docker images ( #3354 )
...
* add arm platform to the build
* add a note about multi-platforms build
* test on current branch
* setup qemu on Github actions
* better naming
* Revert "test on current branch"
This reverts commit b0e5ea77b46e3e0bafd579c95e434c6a3c8ef84f.
2022-10-11 12:29:33 +02:00
Massimiliano Pippi
b49bce97aa
remove test step ( #3278 )
2022-10-04 11:34:43 +02:00
tstadel
05a86b9d3d
feat: FAISS in OpenSearch: Support HNSW for cosine ( #3217 )
...
* support cosine similiarity with faiss
* update docs
* update api docs
* fix tests
* Revert "update api docs"
This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.
* fix api docs
* collapse test
* rename similairity to space_type mappings
* only normalize for faiss
* fix merge
* fix docs normalization
* get rid of List[np.array]
* update docs
* fix tests and tutorials
* fix mypy
* fix mypy
* fix mypy again
* again mypy
* blacken
* update tutorial 4 docs
* fix embeddingretriever
* fix faiss
* move dense specific logic to DenseRetriever
* fix mypy
* cosine tests for all documents stores
* fix pinecone
* add docstring
* docstring corrections
* update docs
* add integration test marker
* docstrings update
* update docs
* fix typo
* update docs
* fix MockDenseRetriever
* run integration tests for all documentstores
* fix test_update_embeddings_cosine_similarity
* fix faiss tests not running
* blacken
* make test_cosine_sanity_check integration test
* split PR
* update docs
* manually revert tutorial doc change
* Fix embedding type
* set integration marker correctly
* make BaseDocumentStore.normalize_embedding static
* format
* fix handling of opensearch_faiss param
* fix merge
* add DenseRetriever typing
* organize imports in conftest.py
* organize imports in conftest.py (2)
* fix DenseRetriever import
* add opensearch-tests-linux
2022-09-23 13:26:49 +02:00
tstadel
4fa9d2d8e7
Fix milvus and faiss tests not running ( #3263 )
...
* fix milvus and faiss tests not running
* fix schema manually
* fix test_dpr_embedding test for milvus
* pip freeze on milvus tests
* fix milvus1 tests being executed: fix all_doc_stores order
* Revert "pip freeze on milvus tests"
This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab.
* make infer_required_doc_store more robust
* don't skip tests without docstore requirements
* use markers for docstore tests
2022-09-22 17:46:49 +02:00
Massimiliano Pippi
2b803a265b
run checks on release branches ( #3267 )
2022-09-22 16:25:34 +02:00
Branden Chan
492a8046d8
docs: sync Haystack API with Readme ( #3223 )
...
* First pass at syncing Haystack API with Readme
* Reapply changes
* Regularize slugs
* Regularize slugs
* Regularize slugs
* Set category id and regen
* Trigger workflow
* Delete old md files
* Test sync
* Undo test string
* Incorporate reviewer feedback
* Test on the fly API generation and sync
* Test on the fly API generation and sync
* Test on the fly API generation and sync
* Test on the fly API generation and sync
* Test on the fly API generation and sync
* Change name of pydoc-markdown scripts
* Test on the fly API generation and sync
* Remove version tag
* Test version tag
* Test version tag
* Test version tag
* Revert test docstring
* Revert md file changes
* Revert md file changes
* Revert script naming
* Test on the fly generation and sync
* Adjust for on the fly generation and sync
* Revert test string
* Remove old documentation workflow
* Set workflow to work on main
* Change readme version name
2022-09-21 17:18:34 +02:00
Massimiliano Pippi
15bb6c2ea2
remove tutorials from the repo ( #3244 )
2022-09-20 18:32:45 +02:00
Massimiliano Pippi
4ddeb7b14b
chore: fix Windows CI ( #3222 )
...
* replicate issue
* pin openjdk version
* not sure it's needed
2022-09-16 13:08:30 +02:00
Sara Zan
768583d00c
chore: disable Windows ES tests on CI ( #3220 )
...
* disable Windows ES tests
* Add comments
2022-09-15 15:18:29 +02:00
Massimiliano Pippi
64b0c43885
refactoring: reimplement Docker strategy ( #3162 )
...
* setup base images
* add cpu flavor
* use the same Dockerfile for cpu and gpu
* better naming, add docs
* add docker workflow
* add missing image input
* change cwd for bake
* also push api images
* try conditional tagging for releases
* revert testing code
* update docker readme
* document variable override
* use Python 3.10
* allow empty HAYSTACK_EXTRAS
* Apply suggestions from code review
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* remove repo description step, can't make it work so far
* add docs to the last step as it's tricky
* manage tags for the newest images
* tests are passing, checking in the last bit
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-09-12 16:33:56 +02:00
Vladimir Blagojevic
20880c9d41
Add 15 min timeout for downloading cached HF models ( #3179 )
2022-09-07 08:35:09 -04:00
Massimiliano Pippi
6790eaf7d8
refactor: update package strategy in rest_api ( #3148 )
...
* update packaging
* fix author metadata
* add newline
* add empty readme
* fix path to pipeline files
* fix pylint job
* fix metadata
2022-09-05 16:58:43 +02:00
Daniel Bichuetti
e1f399284f
refactor: update dependencies and remove pins ( #3147 )
...
* refactor: remove azure-core, pydoc and hf-hub pins
* fix: remove extra-comma
* fix: force minimum version of azure forms recognizer
* refactor: allow newer ocr libs
* refactor: update more dependencies and container versions
* refactor: remove extra comment
* docs: pre-commit manual run
* refactor: remove unnecessary dependency
* tests: update weaviate container image version
2022-09-05 14:30:35 +02:00
Sara Zan
e92ea4fccb
refactor: rename master into main in documentation and links ( #3063 )
...
* master->main
* revert master rename
* Revert change to sphinx link and rename master schema
2022-08-24 19:05:12 +02:00
Vladimir Blagojevic
be127e5b61
Trigger build failure Slack notify only on main repo (not forks) ( #3039 )
...
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-08-18 06:51:39 -04:00
Massimiliano Pippi
2328097ce0
rename the default branch name ( #3045 )
2022-08-16 20:24:58 +02:00