haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-21 16:04:09 +00:00

Author	SHA1	Message	Date
Branden Chan	7b15799853	Change slug and title (#3474 )	2022-10-25 16:41:27 +01:00
Stefano Fiorucci	a2d459dbed	fix: warning if doc store similarity function is incompatible with Sentence Transformers model (#3455 ) * check_docstore_similarity_function * remove import	2022-10-25 17:00:35 +02:00
Stefano Fiorucci	54ec13eaf7	refactor: Change `no_answer` attribute (#3411 ) * always run validation * update schemas * no_answer as a property. break things! * forgotten schema * fix * update openapi * removed my unnecessary test * fix sql document store Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-10-25 13:07:00 +02:00
Julian Risch	6a422d588f	fix: disabling telemetry prevents writing config (#3465 ) * fix: disabling telemetry prevents writing config * set user id to empty string if telemetry disabled * Update haystack/telemetry.py * set id to None instead of "" in error case * remove RuntimeError if user id is not set * Revert "remove RuntimeError if user id is not set" This reverts commit c59f06d47216afa7ada6199b03f1b09a2b936c02. Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-10-25 12:11:54 +02:00
Mayank Jobanputra	d48577b4e7	bug: removed duplicated meta "name" field addition to content before embedding in `update_embeddings` workflow (#3368 ) * Removed explicit passage formatting by name field * passing correct input type for embedding the docs * Updated test, updated similarity scores and added results * changed expected input to embed method	2022-10-25 14:52:05 +05:30
Vladimir Blagojevic	1b9586ae40	Add indexing pipeline type (#3461 )	2022-10-24 17:26:15 +02:00
Timo Moeller	9b931bbf66	Fix prompt length computation (#3448 )	2022-10-24 11:59:54 +02:00
Sara Zan	cbf44413d8	feat: add `__cointains__` to `Span` (#3446 ) * add __contains__ * add tests	2022-10-21 13:58:17 +02:00
Unai Garay Maestre	e41cb24358	Feat: allow decreasing size of datasets loaded from BEIR (#3392 ) * Adds cropping of dataset in eval beir * Adapts queries to remaining cropped documents * Adds logging warning if num_documents has an invalid value * Adapts to linting suggestions	2022-10-21 13:54:20 +02:00
Branden Chan	03ba07dcb5	docs: Extend utils API docs coverage (#3402 ) * Add more utils modules * Format docstrings * Incorporate reviewer feedback	2022-10-21 12:51:11 +01:00
Massimiliano Pippi	df4d20d32c	fix the readme version to sync (#3417 )	2022-10-20 16:50:36 +02:00
Vladimir Blagojevic	79c6063ac2	feat: send event if number of queries exceeds threshold (#3419 ) Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2022-10-20 16:02:45 +02:00
Branden Chan	3f956c75f4	Add multimodal retrieval to API docs (#3430 )	2022-10-20 15:07:48 +02:00
Stefano Fiorucci	abdcb8124b	update pyworld pin (#3435 )	2022-10-20 12:28:38 +02:00
Stefano Fiorucci	8c1a34494d	refactor: update package strategy in ui (#3396 ) * update ui package: first try * update README * fixes * update schemas * restore schemas * use matrix folder in tests * fix tests * fix schemas * really fix schemas * don't use matrix folder * remove blank line * cleaner pytest command	2022-10-20 12:18:03 +02:00
Stefano Fiorucci	3860bb9966	fix: improve Document `__repr__` (#3385 ) * fix document __repr__ * take the best from 2 approaches * fix schema	2022-10-19 22:32:23 +02:00
Vladimir Blagojevic	8f31228211	feat: Add exponential backoff decorator; apply it to OpenAI requests (#3398 )	2022-10-19 17:47:38 +02:00
Massimiliano Pippi	5335e9e4d9	Add new schema for latest unstable (#3415 ) * add new schema for latest unstable * openapi	2022-10-19 13:21:05 +02:00
Julian Risch	16723bf180	bug: change type of split_by to Literal including None (#3389 ) * change type of split_by * fix mpy and update schema files * change split_by type to Literal * handle ImportError for Literal py<3.8	2022-10-19 10:11:41 +02:00
github-actions[bot]	f4a49f7178	Bump version (#3409 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2022-10-18 18:05:48 +02:00
Ursin Brunner	5fedfb03b0	fix: Fix the error of wrong page numbers when documents contain empty pages. (#3330 ) * Fix the error of wrong page numbers when documents contain empty pages. * Reformat using git hooks. * Use a more descriptive placeholder	2022-10-18 17:51:02 +02:00
Sebastian	51d4fe01c3	fix: Update env variable for model caching timeout (#3405 ) * fix: Update env variable for model caching timeout The environment variable used to set the timeout for the model caching step had a typo in it from the maintainers of `actions/cache@v3`, which is why it has not been working (see comment [here](https://github.com/actions/cache/issues/810#issuecomment-1281895575)). * Removed newline	2022-10-18 17:36:25 +02:00
Branden Chan	cf4642a5f8	[CI] Create Github Workflow that creates a new version branch in Haystack and Readme (#3335 ) * Test readme_integration.yml * Test readme_integration.yml * Test variables * Test variables * Test variables * Test variables * Test commit * Test commit * Test commit * Trigger action * Add v * Trigger action * Trigger action * Trigger action * Trigger action * Update API docs headers * Revert "Update API docs headers" This reverts commit 34e665063f4de29854befe575a795dbfef04415c. * Trigger action * Trigger action * Trigger action * Update release * Update release * Update release * Delete File * Split steps into own files * Edit action names * Start making changes * Start implementing version bump * Implement minor version release * Fix github action * Test action * Test action * Test action * Test action * Test action * Change back to main * Add comments * Remove line * Format docstring * Incorporate reviewer feedback * Fix variable name * Print version.txt * Incorporate Reviewer feedback * Rename variables for clarity * Add fetch * Change branch * Change branch * Change branch * Change branch * Change branch * Revert docstring changes * Incorporate reviewer feedback * Run black Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2022-10-18 17:09:43 +02:00
Sebastian	93817f63b4	feat: Speed up integration tests (nodes) (#3408 ) * Changed summarizer model to a smaller one (2GB to 500MB) to save on space and speed up the tests. * Removed google pegasus from cache	2022-10-18 16:23:57 +02:00
Branden Chan	3bf5d4350f	docs: Add comment about the generation of no-answer samples in FARMReader training (#3404 ) * Add comment about no-answer generation * Add comment about no-answer generation * Fix typo Co-authored-by: Sebastian <sjrl@users.noreply.github.com> * Incorporate reviewer feedback * Incorporate reviewer feedback Co-authored-by: Sebastian <sjrl@users.noreply.github.com>	2022-10-18 14:37:37 +02:00
Sebastian	15a59fd040	feat: Updated EntityExtractor to handle long texts and added better postprocessing (#3154 ) * Remove dependence on HuggingFace TokenClassificationPipeline and group all postprocessing functions under one class * Added copyright notice for HF and deepset to entity file to acknowledge that a lot of the postprocessing parts came from the transformers library. * Fixed text squishing problem. Added additional unit test for it. Co-authored-by: ju-gu <julian.gutsch@deepset.ai>	2022-10-17 21:26:44 +02:00
Unai Garay Maestre	3a2c8ae3c5	bug: Adds better way of checking `query` in BaseRetriever and Pipeline.run() (#3304 ) * changes how query and queries are checked if they have been passed in BaseRetriever * Fixes checking query properly in Pipeline run * Fixes checking query properly in Pipeline run * Adds test for FilterRetriever using run method when query is empty * Adds mock filter retriever and adapts test * Removes old test, adds MockRetriever to test file and test uses document_store * Logs error when query is not of type string with a new test for run batch * Update test/nodes/test_retriever.py * schemas	2022-10-17 19:00:13 +02:00
Sara Zan	101d2bc86c	feat: `MultiModalRetriever` (#2891 ) * Adding Data2VecVision and Data2VecText to the supported models and adapt Tokenizers accordingly * content_types * Splitting classes into respective folders * small changes * Fix EOF * eof * black * API * EOF * whitespace * api * improve multimodal similarity processor * tokenizer -> feature extractor * Making feature vectors come out of the feature extractor in the similarity head * embed_queries is now self-sufficient * couple trivial errors * Implemented separate language model classes for multimodal inference * Document embedding seems to work * removing batch_encode_plus, is deprecated anyway * Realized the base Data2Vec models are not trained on retrieval tasks * Issue with the generated embeddings * Add batching * Try to fit CLIP in * Stub of CLIP integration * Retrieval goes through but returns noise only * Still working on the scores * Introduce temporary adapter for CLIP models * Image retrieval now works with sentence-transformers * Tidying up the code * Refactoring is now functional * Add MPNet to the supported sentence transformers models * Remove unused classes * pylint * docs * docs * Remove the method renaming * mpyp first pass * docs * tutorial * schema * mypy * Move devices setup into get_model * more mypy * mypy * pylint * Move a few params in HaystackModel's init * make feature extractor work with squadprocessor * fix feature_extractor_kwargs forwarding * Forgotten part of the fix * Revert unrelated ES change * Revert unrelated memdocstore changes * comment * Small corrections * mypy and pylint * mypy * typo * mypy * Refactor the call * mypy * Do not make FARMReader use the new FeatureExtractor * mypy * Detach DPR tests from FeatureExtractor too * Detach processor tests too * Add end2end marker * extract end2end feature extractor tests * temporary disable feature extraction tests * Introduce end2end tests for tokenizer tests * pylint * Fix model loading from folder in FeatureExtractor * working o n end2end * end2end keeps failing * Restructuring retriever tests * Restructuring retriever tests * remove covert_dataset_to_dataloader * remove comment * Better check sentence-transformers models * Use embed_meta_fields properly * rename passage into document * Embedding dims can't be found * Add check for models that support it * pylint * Split all retriever tests into suites, running mostly on InMemory only * fix mypy * fix tfidf test * fix weaviate tests * Parallelize on every docstore * Fix schema and specify modality in base retriever suite * tests * Add first image tests * remove comment * Revert to simpler tests * Update docs/_src/api/api/primitives.md Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/__init__.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * get_args * mypy * Update haystack/modeling/model/multimodal/__init__.py * Update haystack/modeling/model/multimodal/base.py * Update haystack/modeling/model/multimodal/base.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/sentence_transformers.py * Update haystack/modeling/model/multimodal/sentence_transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/transformers.py * Update haystack/modeling/model/multimodal/transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/nodes/retriever/multimodal/retriever.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * mypy * mypy * removing more ContentTypes * more contentypes * pylint * add to __init__ * revert end2end workflow for now * missing integration markers * Update haystack/nodes/retriever/multimodal/embedder.py Co-authored-by: bogdankostic <bogdankostic@web.de> * review feedback, removing HaystackImageTransformerModel * review feedback part 2 * mypy & pylint * mypy * mypy * fix multimodal docs also for Pinecone * add note on internal constants * Fix pinecone write_documents * schemas * keep support for sentence-transformers only * fix pinecone test * schemas * fix pinecone again * temporarily disable some tests, need to understand if they're still relevant Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-10-17 18:58:35 +02:00
Sebastian	ca2a1e1792	fix: Update how schema is ordered (#3399 ) * Use builtin sort_keys option for json.dump * Order anyof_list which is causing issues	2022-10-17 17:09:32 +02:00
Sara Zan	50f34372e1	fix: stable YAML schema generation (#3388 ) * add key sorting in schema generation * add pre-commit hook * try pre-commit hook * Fixed schemas * trying a simpler version * pylint * ordered dict * reverting to dict * unused import * remove hook	2022-10-14 18:36:47 +02:00
Massimiliano Pippi	7d0f89b6f5	fix: demo won't start through Docker compose (#3337 ) * use new Docker images and add a health check for ES * try * silence streamlit errors * remove CMD override * final touches * leftover * make pylint happy	2022-10-14 18:16:20 +02:00
Vladimir Blagojevic	159cd5a666	feat: Add OpenAIEmbeddingEncoder to EmbeddingRetriever (#3356 )	2022-10-14 15:01:03 +02:00
Vladimir Blagojevic	5ebe3cb33d	fix: QuestionGenerator generates wrong document questions for non-default `num_queries_per_doc` parameter (#3381 )	2022-10-14 12:08:30 +02:00
Stefano Fiorucci	7290196c32	fix: allow same `vector_id` in different indexes for SQL-based Document stores (#3383 ) * fix_multiple_indexes * improve test names	2022-10-14 09:55:56 +02:00
tstadel	ba30971d8d	feat: extract label aggregation (#3363 ) * extract label aggregation * refactoring * reformat * add missing param docstrings * fix comment	2022-10-13 19:09:14 +02:00
Massimiliano Pippi	3b0f00a615	[CI] Use VERSION.txt to sync with Readme (#3367 ) * use VERSION.txt to sync with Readme * add docs * force workflow run * unrelated change * Revert "force workflow run" This reverts commit f0aea59afa57c96f374073465629f893031f727a. * make the steps mutually exclusive	2022-10-13 18:39:23 +02:00
Branden Chan	37bd61a48e	Create minor_version_release.yml (#3338 ) * Create minor_version_release.yml * Incorporate reviewer feedback	2022-10-13 14:32:31 +02:00
Stefano Fiorucci	60f678e120	refactor: remove dead code from FAISSDocumentStore (#3372 )	2022-10-13 13:23:01 +02:00
Massimiliano Pippi	31fa75e9fd	feat: add support for Elasticsearch 7.16.2 (#3318 ) * bump elastic to 7.16.2+ * decouple Elasticsearch and Opensearch use method override instead of func variables fix mypy default value fix broken tests update schema * relax version pin * rename the base class * rename module * fix import order * do not run the new tests in the old job * remove outdated TODO	2022-10-13 11:53:27 +02:00
Sebastian	75641dd024	fix: Added checks for DataParallel and WrappedDataParallel (#3366 ) * Added checks for DataParallel and WrappedDataParallel * Update isinstance checks according to pylint recommendation * Using isinstance over types * Added test for dpr training	2022-10-13 08:05:56 +02:00
Massimiliano Pippi	db6e5754cd	add deprecation notice to old dockerfiles (#3317 )	2022-10-11 16:10:57 +02:00
hsm207	c2537dfc28	Update weaviate schema doc link (#3351 ) * Update weaviate schema doc * Update haystack/document_stores/weaviate.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2022-10-11 15:30:20 +02:00
Massimiliano Pippi	8ddb6d7821	feat: add multi-platform Docker images (#3354 ) * add arm platform to the build * add a note about multi-platforms build * test on current branch * setup qemu on Github actions * better naming * Revert "test on current branch" This reverts commit b0e5ea77b46e3e0bafd579c95e434c6a3c8ef84f.	2022-10-11 12:29:33 +02:00
Malte Pietsch	fb02b61e90	Update README.md (#3247 )	2022-10-11 10:43:17 +02:00
tstadel	7fe5003c97	fix: eval() with `add_isolated_node_eval=True` breaks if no node supports it (#3347 ) * fix isolated eval for pipelines without a node supporting isolated mode * reformat * add test	2022-10-10 20:48:13 +02:00
bogdankostic	84aff5e2b3	fix: Allow less restrictive values for parameters in Pipeline configurations (#3345 ) * fix: Allow arbitrary values for parameters in Pipeline configurations * Add test * Adapt expected error message in tests * Fix bug * Fix bug on checking JSON * Remove test cases that previously tested if error was thrown * Change encoding in test * Restrict possible values * Re-add tests * Re-add tests * Add value flag to list elements	2022-10-10 13:08:45 +02:00
JacdDev	797c20c966	feat: Adding filters param to MostSimilarDocumentsPipeline run and run_batch (#3301 ) * Adding filters param to MostSimilarDocumentsPipeline run and run_batch * Adding index param to MostSimilarDocumentsPipeline run and run_batch * Adding index param documentation to MostSimilarDocumentsPipeline run and run_batch * Updated index param documentation to MostSimilarDocumentsPipeline run and run_batch. Updated type: ignore in run_batch * Adding filters param to MostSimilarDocumentsPipeline run and run_batch * Adding index param to MostSimilarDocumentsPipeline run and run_batch * Adding index param documentation to MostSimilarDocumentsPipeline run and run_batch * Updated index param documentation to MostSimilarDocumentsPipeline run and run_batch. Updated type: ignore in run_batch	2022-10-10 10:22:14 +02:00
tstadel	b84a6b1716	fix: opensearch script score with filters (#3321 ) * fix opensearch script score filters * add comment * add integration test * update schema	2022-10-06 15:41:29 +02:00
Vladimir Blagojevic	6cb4e93965	refactor: remove Inferencer multiprocessing (#3283 )	2022-10-04 14:08:23 +02:00
Massimiliano Pippi	b49bce97aa	remove test step (#3278 )	2022-10-04 11:34:43 +02:00

... 43 44 45 46 47 ...

3803 Commits