175 Commits

Author SHA1 Message Date
Massimiliano Pippi
a15af7f8c3
refactor: Move InMemoryDocumentStore tests to their own class (#3614)
* move tests to their own class

* move more tests

* add specific job

* fix test

* Update test/document_stores/test_memory.py

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-23 15:33:46 +01:00
Massimiliano Pippi
2fadcf2859
add labeler to the repo (#3609) 2022-11-21 20:49:25 +05:30
Massimiliano Pippi
7e0aa82eb8
Update Python version (#3602) 2022-11-21 10:16:47 +01:00
Massimiliano Pippi
1399681c81
move milvus tests to their own module (#3596) 2022-11-17 16:22:02 +01:00
Mayank Jobanputra
3098440a27
bug: fix release number (#3559)
* Added haystack version in docker base build

* test version -- name's bond
2022-11-15 16:31:10 +05:30
Massimiliano Pippi
6a48ace9b9
BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x (#3552)
* remove milvus1

* leftover

* revert deprecation process
2022-11-15 09:54:55 +01:00
Massimiliano Pippi
057a8c0b4f
refactor: Pinecone tests (#3555)
* add pytest option to unmock pinecone

* first try

* handle missing answer

* fix labels metadata

* more tests

* adapt workflow

* typo

* address review comments
2022-11-14 15:19:15 +01:00
Massimiliano Pippi
7af22cd98c
CI: install httpx to run tests (#3565)
* install httpx to run tests

* try
2022-11-14 12:52:04 +01:00
Massimiliano Pippi
4dfddf0d10
refactor: Refactor Weaviate tests (#3541)
* refactor tests

* fix job

* revert

* revert

* revert

* use latest weaviate

* fix abstract methods signatures

* pass class_name to all the CRUD methods

* finish moving all the tests

* bump weaviate version

* raise, don't pass
2022-11-14 09:57:30 +01:00
Massimiliano Pippi
3319ef6d1c
refactor: refactor FAISS tests (#3537)
* fix write docs behaviour

* refactor FAISS tests

* do not remove the sqlite db

* try

* remove extra slash

* Apply suggestions from code review

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

* Update test/document_stores/test_faiss.py

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-08 16:37:01 +01:00
Massimiliano Pippi
af96e002a4
merge black job into testing workflow (#3539) 2022-11-07 20:01:02 +05:30
Massimiliano Pippi
255072d8d5
refactor: move dC tests to their own module and job (#3529)
* move dC tests to their own module and job

* restore global var

* revert
2022-11-04 17:05:10 +01:00
Sara Zan
815017ad5b
Deploy the demo only manually (#3525) 2022-11-04 12:15:58 +01:00
Massimiliano Pippi
2bb81331b7
feat: add SQLDocumentStore tests (#3517)
* port SQL tests

* cleanup document_store_tests.py from sql tests

* leftover

* Update .github/workflows/tests.yml

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

* Update test/document_stores/test_base.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-11-04 09:24:19 +01:00
Massimiliano Pippi
0a04dec808
Use a dedicated PAT (#3511)
* Use a dedicated PAT

* Update project.yml
2022-11-03 12:58:01 +01:00
Sara Zan
b93bbb1cab
refactor: upgrade actions version (#3506)
* upgrade actions version

* upgrade cache action too
2022-11-02 10:35:10 +01:00
Massimiliano Pippi
9fe2f69d56
add workflow to triage new issues with GH projects (#3508) 2022-10-31 16:01:59 +01:00
Massimiliano Pippi
b694c7b5cb
Document Store test refactoring (#3449)
* add new marker

* start using test hierarchies

* move ES tests into their own class

* refactor test workflow

* job steps

* add more tests

* move more tests

* more tests

* test labels

* add more tests

* Update tests.yml

* Update tests.yml

* fix

* typo

* fix es image tag

* map es ports

* try

* fix

* default port

* remove opensearch from the markers sorcery

* revert

* skip new tests in old jobs

* skip opensearch_faiss
2022-10-31 15:30:14 +01:00
Massimiliano Pippi
17cd79e2c8
[release process] Create new schema when bumping unstable (#3416)
* also create new schema when bumping unstable version

* openapi schema

* no need to update the json schema anymore
2022-10-31 12:26:48 +01:00
Sara Zan
54cc9cd4cf
refactor: remove json-schemas (#3485)
* remove json-schemas

* main schema can be removed too

* add .gitignore to schemas folder

* try to explicitly get the new haystack in the rest api tests

* fix workflow again

* fix version string in rest api tests

* add pip freeze

* debug statements in workflow

* -U prevents schema generation
2022-10-31 11:24:43 +01:00
Massimiliano Pippi
9f4a9a76a3
fix: pattern to match tags push (#3469) 2022-10-28 14:52:30 +02:00
Sara Zan
f377b78263
refactor: replace YAML schema check with a dispatch call (#3482)
* Replace yaml check with a dispatch call

* split workflow

* add branch for testing

* access secrets properly

* remove testing branch trigger
2022-10-28 10:48:59 +02:00
Sebastian
59857cb492
feat: Speed up reader tests (#3476)
* Use a smaller reader where possible

* Change scope to module of reader to get faster load times
2022-10-26 19:04:18 +02:00
Vladimir Blagojevic
5ca96357ff
feat: Add CohereEmbeddingEncoder to EmbeddingRetriever (#3453) 2022-10-25 17:52:29 +02:00
Massimiliano Pippi
df4d20d32c
fix the readme version to sync (#3417) 2022-10-20 16:50:36 +02:00
Stefano Fiorucci
8c1a34494d
refactor: update package strategy in ui (#3396)
* update ui package: first try

* update README

* fixes

* update schemas

* restore schemas

* use matrix folder in tests

* fix tests

* fix schemas

* really fix schemas

* don't use matrix folder

* remove blank line

* cleaner pytest command
2022-10-20 12:18:03 +02:00
Sebastian
51d4fe01c3
fix: Update env variable for model caching timeout (#3405)
* fix: Update env variable for model caching timeout

The environment variable used to set the timeout for the model caching step had a typo in it from the maintainers of `actions/cache@v3`, which is why it has not been working (see comment [here](https://github.com/actions/cache/issues/810#issuecomment-1281895575)).

* Removed newline
2022-10-18 17:36:25 +02:00
Branden Chan
cf4642a5f8
[CI] Create Github Workflow that creates a new version branch in Haystack and Readme (#3335)
* Test readme_integration.yml

* Test readme_integration.yml

* Test variables

* Test variables

* Test variables

* Test variables

* Test commit

* Test commit

* Test commit

* Trigger action

* Add v

* Trigger action

* Trigger action

* Trigger action

* Trigger action

* Update API docs headers

* Revert "Update API docs headers"

This reverts commit 34e665063f4de29854befe575a795dbfef04415c.

* Trigger action

* Trigger action

* Trigger action

* Update release

* Update release

* Update release

* Delete File

* Split steps into own files

* Edit action names

* Start making changes

* Start implementing version bump

* Implement minor version release

* Fix github action

* Test action

* Test action

* Test action

* Test action

* Test action

* Change back to main

* Add comments

* Remove line

* Format docstring

* Incorporate reviewer feedback

* Fix variable name

* Print version.txt

* Incorporate Reviewer feedback

* Rename variables for clarity

* Add fetch

* Change branch

* Change branch

* Change branch

* Change branch

* Change branch

* Revert docstring changes

* Incorporate reviewer feedback

* Run black

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2022-10-18 17:09:43 +02:00
Sebastian
93817f63b4
feat: Speed up integration tests (nodes) (#3408)
* Changed summarizer model to a smaller one (2GB to 500MB) to save on space and speed up the tests.

* Removed google pegasus from cache
2022-10-18 16:23:57 +02:00
Sebastian
15a59fd040
feat: Updated EntityExtractor to handle long texts and added better postprocessing (#3154)
* Remove dependence on HuggingFace TokenClassificationPipeline and group all postprocessing functions under one class

* Added copyright notice for HF and deepset to entity file to acknowledge that a lot of the postprocessing parts came from the transformers library.

* Fixed text squishing problem. Added additional unit test for it.

Co-authored-by: ju-gu <julian.gutsch@deepset.ai>
2022-10-17 21:26:44 +02:00
Sara Zan
101d2bc86c
feat: MultiModalRetriever (#2891)
* Adding Data2VecVision and Data2VecText to the supported models and adapt Tokenizers accordingly

* content_types

* Splitting classes into respective folders

* small changes

* Fix EOF

* eof

* black

* API

* EOF

* whitespace

* api

* improve multimodal similarity processor

* tokenizer -> feature extractor

* Making feature vectors come out of the feature extractor in the similarity head

* embed_queries is now self-sufficient

* couple trivial errors

* Implemented separate language model classes for multimodal inference

* Document embedding seems to work

* removing batch_encode_plus, is deprecated anyway

* Realized the base Data2Vec models are not trained on retrieval tasks

* Issue with the generated embeddings

* Add batching

* Try to fit CLIP in

* Stub of CLIP integration

* Retrieval goes through but returns noise only

* Still working on the scores

* Introduce temporary adapter for CLIP models

* Image retrieval now works with sentence-transformers

* Tidying up the code

* Refactoring is now functional

* Add MPNet to the supported sentence transformers models

* Remove unused classes

* pylint

* docs

* docs

* Remove the method renaming

* mpyp first pass

* docs

* tutorial

* schema

* mypy

* Move devices setup into get_model

* more mypy

* mypy

* pylint

* Move a few params in HaystackModel's init

* make feature extractor work with squadprocessor

* fix feature_extractor_kwargs forwarding

* Forgotten part of the fix

* Revert unrelated ES change

* Revert unrelated memdocstore changes

* comment

* Small corrections

* mypy and pylint

* mypy

* typo

* mypy

* Refactor the  call

* mypy

* Do not make FARMReader use the new FeatureExtractor

* mypy

* Detach DPR tests from FeatureExtractor too

* Detach processor tests too

* Add end2end marker

* extract end2end feature extractor tests

* temporary disable feature extraction tests

* Introduce end2end tests for tokenizer tests

* pylint

* Fix model loading from folder in FeatureExtractor

* working o n end2end

* end2end keeps failing

* Restructuring retriever tests

* Restructuring retriever tests

* remove covert_dataset_to_dataloader

* remove comment

* Better check sentence-transformers models

* Use embed_meta_fields properly

* rename passage into document

* Embedding dims can't be found

* Add check for models that support it

* pylint

* Split all retriever tests into suites, running mostly on InMemory only

* fix mypy

* fix tfidf test

* fix weaviate tests

* Parallelize on every docstore

* Fix schema and specify modality in base retriever suite

* tests

* Add first image tests

* remove comment

* Revert to simpler tests

* Update docs/_src/api/api/primitives.md

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/modeling/model/multimodal/__init__.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* get_args

* mypy

* Update haystack/modeling/model/multimodal/__init__.py

* Update haystack/modeling/model/multimodal/base.py

* Update haystack/modeling/model/multimodal/base.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/modeling/model/multimodal/sentence_transformers.py

* Update haystack/modeling/model/multimodal/sentence_transformers.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/modeling/model/multimodal/transformers.py

* Update haystack/modeling/model/multimodal/transformers.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/modeling/model/multimodal/transformers.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/nodes/retriever/multimodal/retriever.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* mypy

* mypy

* removing more ContentTypes

* more contentypes

* pylint

* add to __init__

* revert end2end workflow for now

* missing integration markers

* Update haystack/nodes/retriever/multimodal/embedder.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* review feedback, removing HaystackImageTransformerModel

* review feedback part 2

* mypy & pylint

* mypy

* mypy

* fix multimodal docs also for Pinecone

* add note on internal constants

* Fix pinecone write_documents

* schemas

* keep support for sentence-transformers only

* fix pinecone test

* schemas

* fix pinecone again

* temporarily disable some tests, need to understand if they're still relevant

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-10-17 18:58:35 +02:00
Massimiliano Pippi
3b0f00a615
[CI] Use VERSION.txt to sync with Readme (#3367)
* use VERSION.txt to sync with Readme

* add docs

* force workflow run

* unrelated change

* Revert "force workflow run"

This reverts commit f0aea59afa57c96f374073465629f893031f727a.

* make the steps mutually exclusive
2022-10-13 18:39:23 +02:00
Branden Chan
37bd61a48e
Create minor_version_release.yml (#3338)
* Create minor_version_release.yml

* Incorporate reviewer feedback
2022-10-13 14:32:31 +02:00
Massimiliano Pippi
31fa75e9fd
feat: add support for Elasticsearch 7.16.2 (#3318)
* bump elastic to 7.16.2+

* decouple Elasticsearch and Opensearch

use method override instead of func variables

fix mypy

default value

fix broken tests

update schema

* relax version pin

* rename the base class

* rename module

* fix import order

* do not run the new tests in the old job

* remove outdated TODO
2022-10-13 11:53:27 +02:00
Massimiliano Pippi
8ddb6d7821
feat: add multi-platform Docker images (#3354)
* add arm platform to the build

* add a note about multi-platforms build

* test on current branch

* setup qemu on Github actions

* better naming

* Revert "test on current branch"

This reverts commit b0e5ea77b46e3e0bafd579c95e434c6a3c8ef84f.
2022-10-11 12:29:33 +02:00
Massimiliano Pippi
b49bce97aa
remove test step (#3278) 2022-10-04 11:34:43 +02:00
tstadel
05a86b9d3d
feat: FAISS in OpenSearch: Support HNSW for cosine (#3217)
* support cosine similiarity with faiss

* update docs

* update api docs

* fix tests

* Revert "update api docs"

This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.

* fix api docs

* collapse test

* rename similairity to space_type mappings

* only normalize for faiss

* fix merge

* fix docs normalization

* get rid of List[np.array]

* update docs

* fix tests and tutorials

* fix mypy

* fix mypy

* fix mypy again

* again mypy

* blacken

* update tutorial  4 docs

* fix embeddingretriever

* fix faiss

* move dense specific logic to DenseRetriever

* fix mypy

* cosine tests for all documents stores

* fix pinecone

* add docstring

* docstring corrections

* update docs

* add integration test marker

* docstrings update

* update docs

* fix typo

* update docs

* fix MockDenseRetriever

* run integration tests for all documentstores

* fix test_update_embeddings_cosine_similarity

* fix faiss tests not running

* blacken

* make test_cosine_sanity_check integration test

* split PR

* update docs

* manually revert tutorial doc change

* Fix embedding type

* set integration marker correctly

* make BaseDocumentStore.normalize_embedding static

* format

* fix handling of opensearch_faiss param

* fix merge

* add DenseRetriever typing

* organize imports in conftest.py

* organize imports in conftest.py (2)

* fix DenseRetriever import

* add opensearch-tests-linux
2022-09-23 13:26:49 +02:00
tstadel
4fa9d2d8e7
Fix milvus and faiss tests not running (#3263)
* fix milvus and faiss tests not running

* fix schema manually

* fix test_dpr_embedding test for milvus

* pip freeze on milvus tests

* fix milvus1 tests being executed: fix all_doc_stores order

* Revert "pip freeze on milvus tests"

This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab.

* make infer_required_doc_store more robust

* don't skip tests without docstore requirements

* use markers for docstore tests
2022-09-22 17:46:49 +02:00
Massimiliano Pippi
2b803a265b
run checks on release branches (#3267) 2022-09-22 16:25:34 +02:00
Branden Chan
492a8046d8
docs: sync Haystack API with Readme (#3223)
* First pass at syncing Haystack API with Readme

* Reapply changes

* Regularize slugs

* Regularize slugs

* Regularize slugs

* Set category id and regen

* Trigger workflow

* Delete old md files

* Test sync

* Undo test string

* Incorporate reviewer feedback

* Test on the fly API generation and sync

* Test on the fly API generation and sync

* Test on the fly API generation and sync

* Test on the fly API generation and sync

* Test on the fly API generation and sync

* Change name of pydoc-markdown scripts

* Test on the fly API generation and sync

* Remove version tag

* Test version tag

* Test version tag

* Test version tag

* Revert test docstring

* Revert md file changes

* Revert md file changes

* Revert script naming

* Test on the fly generation and sync

* Adjust for on the fly generation and sync

* Revert test string

* Remove old documentation workflow

* Set workflow to work on main

* Change readme version name
2022-09-21 17:18:34 +02:00
Massimiliano Pippi
15bb6c2ea2
remove tutorials from the repo (#3244) 2022-09-20 18:32:45 +02:00
Massimiliano Pippi
4ddeb7b14b
chore: fix Windows CI (#3222)
* replicate issue

* pin openjdk version

* not sure it's needed
2022-09-16 13:08:30 +02:00
Sara Zan
768583d00c
chore: disable Windows ES tests on CI (#3220)
* disable Windows ES tests

* Add comments
2022-09-15 15:18:29 +02:00
Massimiliano Pippi
64b0c43885
refactoring: reimplement Docker strategy (#3162)
* setup base images

* add cpu flavor

* use the same Dockerfile for cpu and gpu

* better naming, add docs

* add docker workflow

* add missing image input

* change cwd for bake

* also push api images

* try conditional tagging for releases

* revert testing code

* update docker readme

* document variable override

* use Python 3.10

* allow empty HAYSTACK_EXTRAS

* Apply suggestions from code review

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* remove repo description step, can't make it work so far

* add docs to the last step as it's tricky

* manage tags for the newest images

* tests are passing, checking in the last bit

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-09-12 16:33:56 +02:00
Vladimir Blagojevic
20880c9d41
Add 15 min timeout for downloading cached HF models (#3179) 2022-09-07 08:35:09 -04:00
Massimiliano Pippi
6790eaf7d8
refactor: update package strategy in rest_api (#3148)
* update packaging

* fix author metadata

* add newline

* add empty readme

* fix path to pipeline files

* fix pylint job

* fix metadata
2022-09-05 16:58:43 +02:00
Daniel Bichuetti
e1f399284f
refactor: update dependencies and remove pins (#3147)
* refactor: remove azure-core, pydoc and hf-hub pins

* fix: remove extra-comma

* fix: force minimum version of azure forms recognizer

* refactor: allow newer ocr libs

* refactor: update more dependencies and container versions

* refactor: remove extra comment

* docs: pre-commit manual run

* refactor: remove unnecessary dependency

* tests: update weaviate container image version
2022-09-05 14:30:35 +02:00
Sara Zan
e92ea4fccb
refactor: rename master into main in documentation and links (#3063)
* master->main

* revert master rename

* Revert change to sphinx link and rename master schema
2022-08-24 19:05:12 +02:00
Vladimir Blagojevic
be127e5b61
Trigger build failure Slack notify only on main repo (not forks) (#3039)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-08-18 06:51:39 -04:00
Massimiliano Pippi
2328097ce0
rename the default branch name (#3045) 2022-08-16 20:24:58 +02:00