haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2026-01-08 04:56:45 +00:00

Author	SHA1	Message	Date
Sebastian	d0f786af9f	feat: Bump transformers version to remove torch scatter dependency (#3703 ) * Bump transformers version so we can remove torch scatter dependency * manual re-merge Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2022-12-13 18:33:07 +05:30
Sara Zan	fc89f6ea74	fix: revert Weaviate query with filters and improve tests (#3646 ) * revert weaviate query with filters and improve tests * pylint * upgrade weaviate container * use latest docker tag * fix text * fix text	2022-12-06 14:48:58 +01:00
Massimiliano Pippi	a15af7f8c3	refactor: Move `InMemoryDocumentStore` tests to their own class (#3614 ) * move tests to their own class * move more tests * add specific job * fix test * Update test/document_stores/test_memory.py Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-11-23 15:33:46 +01:00
Massimiliano Pippi	1399681c81	move milvus tests to their own module (#3596 )	2022-11-17 16:22:02 +01:00
Massimiliano Pippi	6a48ace9b9	BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x (#3552 ) * remove milvus1 * leftover * revert deprecation process	2022-11-15 09:54:55 +01:00
Massimiliano Pippi	057a8c0b4f	refactor: Pinecone tests (#3555 ) * add pytest option to unmock pinecone * first try * handle missing answer * fix labels metadata * more tests * adapt workflow * typo * address review comments	2022-11-14 15:19:15 +01:00
Massimiliano Pippi	7af22cd98c	CI: install httpx to run tests (#3565 ) * install httpx to run tests * try	2022-11-14 12:52:04 +01:00
Massimiliano Pippi	4dfddf0d10	refactor: Refactor Weaviate tests (#3541 ) * refactor tests * fix job * revert * revert * revert * use latest weaviate * fix abstract methods signatures * pass class_name to all the CRUD methods * finish moving all the tests * bump weaviate version * raise, don't pass	2022-11-14 09:57:30 +01:00
Massimiliano Pippi	3319ef6d1c	refactor: refactor FAISS tests (#3537 ) * fix write docs behaviour * refactor FAISS tests * do not remove the sqlite db * try * remove extra slash * Apply suggestions from code review Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> * review comments * Update test/document_stores/test_faiss.py Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> * review comments Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-11-08 16:37:01 +01:00
Massimiliano Pippi	af96e002a4	merge black job into testing workflow (#3539 )	2022-11-07 20:01:02 +05:30
Massimiliano Pippi	255072d8d5	refactor: move dC tests to their own module and job (#3529 ) * move dC tests to their own module and job * restore global var * revert	2022-11-04 17:05:10 +01:00
Massimiliano Pippi	2bb81331b7	feat: add SQLDocumentStore tests (#3517 ) * port SQL tests * cleanup document_store_tests.py from sql tests * leftover * Update .github/workflows/tests.yml Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> * review comments * Update test/document_stores/test_base.py Co-authored-by: bogdankostic <bogdankostic@web.de> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-11-04 09:24:19 +01:00
Sara Zan	b93bbb1cab	refactor: upgrade actions version (#3506 ) * upgrade actions version * upgrade cache action too	2022-11-02 10:35:10 +01:00
Massimiliano Pippi	b694c7b5cb	Document Store test refactoring (#3449 ) * add new marker * start using test hierarchies * move ES tests into their own class * refactor test workflow * job steps * add more tests * move more tests * more tests * test labels * add more tests * Update tests.yml * Update tests.yml * fix * typo * fix es image tag * map es ports * try * fix * default port * remove opensearch from the markers sorcery * revert * skip new tests in old jobs * skip opensearch_faiss	2022-10-31 15:30:14 +01:00
Sara Zan	54cc9cd4cf	refactor: remove `json-schemas` (#3485 ) * remove json-schemas * main schema can be removed too * add .gitignore to schemas folder * try to explicitly get the new haystack in the rest api tests * fix workflow again * fix version string in rest api tests * add pip freeze * debug statements in workflow * -U prevents schema generation	2022-10-31 11:24:43 +01:00
Sebastian	59857cb492	feat: Speed up reader tests (#3476 ) * Use a smaller reader where possible * Change scope to module of reader to get faster load times	2022-10-26 19:04:18 +02:00
Vladimir Blagojevic	5ca96357ff	feat: Add CohereEmbeddingEncoder to EmbeddingRetriever (#3453 )	2022-10-25 17:52:29 +02:00
Stefano Fiorucci	8c1a34494d	refactor: update package strategy in ui (#3396 ) * update ui package: first try * update README * fixes * update schemas * restore schemas * use matrix folder in tests * fix tests * fix schemas * really fix schemas * don't use matrix folder * remove blank line * cleaner pytest command	2022-10-20 12:18:03 +02:00
Sebastian	51d4fe01c3	fix: Update env variable for model caching timeout (#3405 ) * fix: Update env variable for model caching timeout The environment variable used to set the timeout for the model caching step had a typo in it from the maintainers of `actions/cache@v3`, which is why it has not been working (see comment [here](https://github.com/actions/cache/issues/810#issuecomment-1281895575)). * Removed newline	2022-10-18 17:36:25 +02:00
Sebastian	93817f63b4	feat: Speed up integration tests (nodes) (#3408 ) * Changed summarizer model to a smaller one (2GB to 500MB) to save on space and speed up the tests. * Removed google pegasus from cache	2022-10-18 16:23:57 +02:00
Sebastian	15a59fd040	feat: Updated EntityExtractor to handle long texts and added better postprocessing (#3154 ) * Remove dependence on HuggingFace TokenClassificationPipeline and group all postprocessing functions under one class * Added copyright notice for HF and deepset to entity file to acknowledge that a lot of the postprocessing parts came from the transformers library. * Fixed text squishing problem. Added additional unit test for it. Co-authored-by: ju-gu <julian.gutsch@deepset.ai>	2022-10-17 21:26:44 +02:00
Sara Zan	101d2bc86c	feat: `MultiModalRetriever` (#2891 ) * Adding Data2VecVision and Data2VecText to the supported models and adapt Tokenizers accordingly * content_types * Splitting classes into respective folders * small changes * Fix EOF * eof * black * API * EOF * whitespace * api * improve multimodal similarity processor * tokenizer -> feature extractor * Making feature vectors come out of the feature extractor in the similarity head * embed_queries is now self-sufficient * couple trivial errors * Implemented separate language model classes for multimodal inference * Document embedding seems to work * removing batch_encode_plus, is deprecated anyway * Realized the base Data2Vec models are not trained on retrieval tasks * Issue with the generated embeddings * Add batching * Try to fit CLIP in * Stub of CLIP integration * Retrieval goes through but returns noise only * Still working on the scores * Introduce temporary adapter for CLIP models * Image retrieval now works with sentence-transformers * Tidying up the code * Refactoring is now functional * Add MPNet to the supported sentence transformers models * Remove unused classes * pylint * docs * docs * Remove the method renaming * mpyp first pass * docs * tutorial * schema * mypy * Move devices setup into get_model * more mypy * mypy * pylint * Move a few params in HaystackModel's init * make feature extractor work with squadprocessor * fix feature_extractor_kwargs forwarding * Forgotten part of the fix * Revert unrelated ES change * Revert unrelated memdocstore changes * comment * Small corrections * mypy and pylint * mypy * typo * mypy * Refactor the call * mypy * Do not make FARMReader use the new FeatureExtractor * mypy * Detach DPR tests from FeatureExtractor too * Detach processor tests too * Add end2end marker * extract end2end feature extractor tests * temporary disable feature extraction tests * Introduce end2end tests for tokenizer tests * pylint * Fix model loading from folder in FeatureExtractor * working o n end2end * end2end keeps failing * Restructuring retriever tests * Restructuring retriever tests * remove covert_dataset_to_dataloader * remove comment * Better check sentence-transformers models * Use embed_meta_fields properly * rename passage into document * Embedding dims can't be found * Add check for models that support it * pylint * Split all retriever tests into suites, running mostly on InMemory only * fix mypy * fix tfidf test * fix weaviate tests * Parallelize on every docstore * Fix schema and specify modality in base retriever suite * tests * Add first image tests * remove comment * Revert to simpler tests * Update docs/_src/api/api/primitives.md Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/__init__.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * get_args * mypy * Update haystack/modeling/model/multimodal/__init__.py * Update haystack/modeling/model/multimodal/base.py * Update haystack/modeling/model/multimodal/base.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/sentence_transformers.py * Update haystack/modeling/model/multimodal/sentence_transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/transformers.py * Update haystack/modeling/model/multimodal/transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/nodes/retriever/multimodal/retriever.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * mypy * mypy * removing more ContentTypes * more contentypes * pylint * add to __init__ * revert end2end workflow for now * missing integration markers * Update haystack/nodes/retriever/multimodal/embedder.py Co-authored-by: bogdankostic <bogdankostic@web.de> * review feedback, removing HaystackImageTransformerModel * review feedback part 2 * mypy & pylint * mypy * mypy * fix multimodal docs also for Pinecone * add note on internal constants * Fix pinecone write_documents * schemas * keep support for sentence-transformers only * fix pinecone test * schemas * fix pinecone again * temporarily disable some tests, need to understand if they're still relevant Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-10-17 18:58:35 +02:00
Massimiliano Pippi	31fa75e9fd	feat: add support for Elasticsearch 7.16.2 (#3318 ) * bump elastic to 7.16.2+ * decouple Elasticsearch and Opensearch use method override instead of func variables fix mypy default value fix broken tests update schema * relax version pin * rename the base class * rename module * fix import order * do not run the new tests in the old job * remove outdated TODO	2022-10-13 11:53:27 +02:00
tstadel	05a86b9d3d	feat: FAISS in OpenSearch: Support HNSW for cosine (#3217 ) * support cosine similiarity with faiss * update docs * update api docs * fix tests * Revert "update api docs" This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143. * fix api docs * collapse test * rename similairity to space_type mappings * only normalize for faiss * fix merge * fix docs normalization * get rid of List[np.array] * update docs * fix tests and tutorials * fix mypy * fix mypy * fix mypy again * again mypy * blacken * update tutorial 4 docs * fix embeddingretriever * fix faiss * move dense specific logic to DenseRetriever * fix mypy * cosine tests for all documents stores * fix pinecone * add docstring * docstring corrections * update docs * add integration test marker * docstrings update * update docs * fix typo * update docs * fix MockDenseRetriever * run integration tests for all documentstores * fix test_update_embeddings_cosine_similarity * fix faiss tests not running * blacken * make test_cosine_sanity_check integration test * split PR * update docs * manually revert tutorial doc change * Fix embedding type * set integration marker correctly * make BaseDocumentStore.normalize_embedding static * format * fix handling of opensearch_faiss param * fix merge * add DenseRetriever typing * organize imports in conftest.py * organize imports in conftest.py (2) * fix DenseRetriever import * add opensearch-tests-linux	2022-09-23 13:26:49 +02:00
tstadel	4fa9d2d8e7	Fix milvus and faiss tests not running (#3263 ) * fix milvus and faiss tests not running * fix schema manually * fix test_dpr_embedding test for milvus * pip freeze on milvus tests * fix milvus1 tests being executed: fix all_doc_stores order * Revert "pip freeze on milvus tests" This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab. * make infer_required_doc_store more robust * don't skip tests without docstore requirements * use markers for docstore tests	2022-09-22 17:46:49 +02:00
Massimiliano Pippi	2b803a265b	run checks on release branches (#3267 )	2022-09-22 16:25:34 +02:00
Massimiliano Pippi	4ddeb7b14b	chore: fix Windows CI (#3222 ) * replicate issue * pin openjdk version * not sure it's needed	2022-09-16 13:08:30 +02:00
Sara Zan	768583d00c	chore: disable Windows ES tests on CI (#3220 ) * disable Windows ES tests * Add comments	2022-09-15 15:18:29 +02:00
Vladimir Blagojevic	20880c9d41	Add 15 min timeout for downloading cached HF models (#3179 )	2022-09-07 08:35:09 -04:00
Massimiliano Pippi	6790eaf7d8	refactor: update package strategy in rest_api (#3148 ) * update packaging * fix author metadata * add newline * add empty readme * fix path to pipeline files * fix pylint job * fix metadata	2022-09-05 16:58:43 +02:00
Daniel Bichuetti	e1f399284f	refactor: update dependencies and remove pins (#3147 ) * refactor: remove azure-core, pydoc and hf-hub pins * fix: remove extra-comma * fix: force minimum version of azure forms recognizer * refactor: allow newer ocr libs * refactor: update more dependencies and container versions * refactor: remove extra comment * docs: pre-commit manual run * refactor: remove unnecessary dependency * tests: update weaviate container image version	2022-09-05 14:30:35 +02:00
Vladimir Blagojevic	be127e5b61	Trigger build failure Slack notify only on main repo (not forks) (#3039 ) Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2022-08-18 06:51:39 -04:00
Massimiliano Pippi	2328097ce0	rename the default branch name (#3045 )	2022-08-16 20:24:58 +02:00
bogdankostic	81a5949103	ci: Increase Weaviate's disk usage + print docker logs (#3026 )	2022-08-11 18:13:43 +02:00
Vladimir Blagojevic	50f7d660e2	Add slack hook for test failures (#2996 )	2022-08-09 08:27:52 -04:00
Massimiliano Pippi	40d07c2038	Enable Opensearch unit tests in Windows CI (#2936 ) * enable Opensearch unit tests under Win * move unit tests into a dedicated job * skip audio tests on missing dependencies * avoid failing test collection when soundfile is not available * Update .github/workflows/tests.yml Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-08-03 19:19:07 +02:00
Vladimir Blagojevic	86d56b4dfe	Add HF model caching for integration tests (#2909 ) * Add HF model caching for integration tests * Remove windows mode caching - not worth it	2022-07-29 18:17:05 +02:00
Massimiliano Pippi	e7627c3f8b	Use opensearch-py in OpenSearchDocumentStore (#2691 ) * add Opensearch extras * let OpenSearchDocumentStore use opensearch-py * Update Documentation & Code Style * fix a bug found after adding tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-07-28 10:04:49 +02:00
Zoltan Fedor	adb2b2c312	Add support for BM25 with the Weaviate document store (#2860 ) * Upgrading Weaviate used for testing to 1.14.1 from 1.11.0 This has also brought up an issue with one of the test filtering for value "a". This test has started to fail, as "a" is a default stopword in Weaviate, so I have changed this test to look for value "c" instead of value "a" to get around the stopword issue. * Weaviate client upgrade From v3.3.3 to v3.6.0 * Adding BM25 Retrieval to Weaviate Weaviate now supports BM25 retrieval in experiment mode and with some limitations (like it cannot be combined with filters). This commit adds support for inverted index (BM25) querying against Weaviate. * Running Black on the recent code changes * Update Documentation & Code Style * Fixing linting issues after code changes by black * The BM25 query needs to be in all lowercase for now The BM25 query needs to be provided all lowercase while the functionality is in experimental mode in Weaviate. See https://app.slack.com/client/T0181DYT9KN/C017EG2SL3H/thread/C017EG2SL3H-1658790227.208119 * Fixing method parameter docstring to highlight that they are not supported in Weaviate * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-07-27 10:07:13 +02:00
Sara Zan	2d65c380f1	pre-commit hooks (#2819 ) * Add pre-commit config * update contributing guidelines * try failing the workflow * add pre-commit to the deps * updating uninstall instructions * separate jobs in CI * make tutorials check fail * make black check fail * make openapi check fail * make yaml schema and api docs checks fail * highlight the instructions * Update .pre-commit-config.yaml Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de> * Update CONTRIBUTING.md Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de> * Update CONTRIBUTING.md Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de> * Use black --check * Add images of the CI * title level * feedback Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>	2022-07-26 15:02:15 +02:00
Sara Zan	5119acb260	Raise timeout on integration tests (#2880 )	2022-07-25 06:43:20 -04:00
Sara Zan	48644b23fb	Enable CI on tutorials (#2801 ) * enable ci on tutorials * Disable all path restrictions for safety * actually comment out the paths block * remove comment	2022-07-18 17:59:55 +02:00
Sara Zan	6b39fbd39c	Mocking Pinecone tests (#2778 ) * Integrating the mock into conftest.py * re-enable workflow * delete_all * Update Documentation & Code Style * remove ValueError * Add empty response * wrong condition * return response * revert removal of delete_all * change mock * Update Documentation & Code Style * test for rest api, to revert Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-07-14 20:03:33 +02:00
Massimiliano Pippi	82df677ebf	API tests (#2738 ) * clean up tests and run earlier * use change detection * better naming, skip ES * more cleanup * fix job name * dummy commit to trigger the CI * mock away the PDF converter * make the test compatible with 3.7 * removed leftover * always run the api tests, use a matrix for the OS * refactor all the tests * remove outdated dependency * pylint * new abstract method * adjust for older python versions * rename pipeline file * address PR comments	2022-07-14 15:36:28 +02:00
Malte Pietsch	ba08fc86f5	Add node to use OpenAI's GPT-3 for QA (#2605 ) * first draft of openai node for QA * Update Documentation & Code Style * fix mypy. add node to inits * Update Documentation & Code Style * fix linter * Adapt OpenAIGenerator to completions endpoint * Update Documentation & Code Style * Fix pylint * Fix doc strings * Make use of temperature * Make use of api key in tests * Adapt doc strings Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ZanSara <sarazanzo94@gmail.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-07-08 13:59:27 +02:00
tstadel	2a7c0139f5	double max heap size for elasticsearch in CI (#2756 )	2022-07-05 13:53:32 +02:00
Julian Risch	1781e88802	Upgrade torch to 1.12 (#2741 ) * Upgrade torch to 1.12 * upgrade torch-scatter * add explicit torch-scatter installation * set torch dependency to range >1.9,<1.13	2022-07-01 20:23:32 +02:00
Sara Zan	400d2cdf77	Fix audio tests on CI (#2718 ) * Update Documentation & Code Style * fix huggingface-hub version Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-24 11:36:31 +02:00
Sara Zan	505ababf43	Skip Pinecone tests (#2696 ) * comment out Pinecone tests block * Add comment	2022-06-21 14:49:36 +02:00
Sara Zan	584e046642	`AnswerToSpeech` (#2584 ) * Add new audio answer primitives * Add AnswerToSpeech * Add dependency group * Update Documentation & Code Style * Extract TextToSpeech in a helper class, create DocumentToSpeech and primitives * Add tests * Update Documentation & Code Style * Add ability to compress audio and more tests * Add audio group to test, all and all-gpu * fix pylint * Update Documentation & Code Style * Accidental git tag * Try pleasing mypy * Update Documentation & Code Style * fix pylint * Add warning for missing OS library and support in CI * Try fixing mypy * Update Documentation & Code Style * Add docs, simplify args for audio nodes and add tutorials * Fix mypy * Fix run_batch * Feedback on tutorials * fix mypy and pylint * Fix mypy again * Fix mypy yet again * Fix the ci * Fix dicts merge and install ffmpeg on CI * Make the audio nodes import safe * Trying to increase tolerance in audio test * Fix import paths * fix linter * Update Documentation & Code Style * Add audio libs in unit tests * Update _text_to_speech.py * Update answer_to_speech.py * Use dedicated dataset & update telemetry * Remove and use distilled roberta * Revert special primitives so that the nodes run in indexing * Improve tutorials and fix smaller bugs * Update Documentation & Code Style * Fix serialization issue * Update Documentation & Code Style * Improve tutorial * Update Documentation & Code Style * Update _text_to_speech.py * Minor lg updates * Minor lg updates to tutorial * Making indexing work in tutorials * Update Documentation & Code Style * Improve docstrings * Try to use GPU when available * Update Documentation & Code Style * Fixi mypy and pylint * Try to pass the device correctly * Update Documentation & Code Style * Use type of device * use .cpu() * Improve .ipynb * update apt index to be able to download libsndfile1 * Fix SpeechDocument.from_dict() * Change pip URL Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-06-15 10:13:18 +02:00

1 2

55 Commits