haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-08-08 08:38:58 +00:00

Author	SHA1	Message	Date
ZanSara	94f660c56f	feat: store `id_hash_keys` in `Document` objects to make documents clonable (#3697 ) * store id_hash_keys in Document objects * fix id_hash_keys calls throughout codebase * generate schema * fix es * fix weaviate * backward compatible * openapi schema * remove unused deprecation warning * remove unused imports * openapi * unused var * Apply suggestions from code review Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/schema.py * Apply suggestions from code review Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/schema.py * review feedback * trailing spaces * pylint * add deprecation test Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-01-23 15:00:52 +01:00
ZanSara	2f15f3c64d	Fix `OpensearchDocumentStore` docstring (#3904 )	2023-01-23 19:19:40 +05:30
Silvano Cerza	afa2bb1386	fix: Remove double super class init from ParsrConverter init (#3896 )	2023-01-23 12:31:27 +01:00
Silvano Cerza	45bea5a838	chore: Add timeouts to external requests calls (#3895 ) * chore: Add timeouts to external requests calls * Remove :type directives from docstrings	2023-01-23 12:31:13 +01:00
Stefano Fiorucci	b910df7ec7	feat: `ImageToText` (caption generator) (#3859 ) * first draft * fix pylint and mypy * retry w mypy * mypy :-) * rem unused import * incorporate feedback and initial tests * better tests * fix import order * fix docstring * other fix docstring * more and better tests Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2023-01-23 11:59:56 +01:00
Sebastian	d2bba4935b	feat: Use truncate option for Cohere.embed (#3865 ) * Use truncate option for cohere request instead of GPT2 tokenizer to truncate texts * Update max batch size for cohere which is 96 Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2023-01-20 09:49:55 +01:00
Vladimir Blagojevic	04deb3b535	feat: Add retry with exponential back-off to PromptNode's OpenAI models (#3886 )	2023-01-19 21:04:32 +01:00
ZanSara	90c877a559	bug: `mypy` should ignore files in `test/` (#3894 ) * exclude files in test/ * verify that the CI ignores test files * dont fail in case of no files	2023-01-19 18:12:26 +01:00
Vladimir Blagojevic	4c28253955	feat: PromptNode - implement stop words (#3884 )	2023-01-19 12:26:15 +01:00
Vladimir Blagojevic	e2fb82b148	refactor: Move invocation_context from meta to own pipeline variable (#3888 )	2023-01-19 11:17:06 +01:00
ZanSara	34b7db0209	chore: enable `singleton-comparison` and cleanup (#3849 ) * enable singleton-comparison * fix triadaptive_model bug	2023-01-19 10:07:41 +01:00
ZanSara	6f5a2fb1da	fix: remove string validation in YAML (#3854 ) * remove string validation in YAML * unused import * fix import * remove tests * fix tests	2023-01-19 10:06:53 +01:00
Mayank Jobanputra	dad7b12874	fix: Allowing InMemStore and FAISSDocStore for indexing using single worker (#3868 ) * Allowing InMemStore and FAISSDocStore for indexing using single worker YAML config * unified pipeline & doc store loading * fix pylint warning * separated tests * removed unnecessay caplog	2023-01-19 14:06:00 +05:30
Ahmed Nabil	12e057837b	Adding condition to `pinecone` object. (#3768 ) * Adding condition to `pinecone` object. While you can assign any values to `PineconeDocumentStore`'s parameter `pinecone_index`, it must have another condition to prevent that from happening. * Added test, and changed the code to make sure the pinecone idx variable has correct instance * fixed black error Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2023-01-19 01:34:44 +05:30
Vladimir Blagojevic	c44d67856e	Simplify PromptTemplate substitution in PromptNode (#3876 )	2023-01-18 18:31:15 +01:00
ZanSara	eb57e1fc09	chore: make Mypy work when Haystack is installed (#3856 ) * add ignore statements to each failing line in haystack/ * simplify workflow * few typos * mypy cache directory missing * mypy cache directory missing * install types from Haystack only * install types from rest_api too * mypy vs literal * install types at check time * add mypy cache to python cache * fix version condition * fix version condition * try running mypy only on affected files * try using explicit hashes * try another approach * filter python files * typo * quotes * use action	2023-01-18 15:36:10 +01:00
ZanSara	6af4f14fe0	feat: preprocessor raises warning when doc length exceeds threshold (#3837 ) * add warning for excessive lenght * improve test * review feedback * fix test * move into _process_single	2023-01-17 13:48:28 +01:00
ZanSara	c50968dfe5	upgrade es to the version used in the CI (#3858 )	2023-01-17 13:47:37 +01:00
ZanSara	9e457db2e9	test: add version deprecation fixture (#3851 ) * add fixture * Update test/conftest.py * remove +2 and add tests * few typos * more cases * Update test/conftest.py	2023-01-16 15:36:14 +01:00
ZanSara	3ffdb0a9a3	chore: fix all EOF (#3852 ) * fix all eof * fix test * fix test * fix test * typo * fix sample * fix sample * add logs * fix page_dynamic_result.txt	2023-01-16 12:34:50 +01:00
ZanSara	62935bde6d	enable `unused-variable` (#3846 )	2023-01-12 19:38:45 +01:00
Benjamin BERNARD	15203d864b	docs: Proposal - CSV FAQ indexing feature (#3638 ) * docs(proposal): Add new proposal about CSV FAQ indexing feature * docs(proposal): Add new proposal about CSV FAQ indexing feature Introduce PR number. * Review feedback * Mixed up the PR numbers Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>	2023-01-12 11:07:26 +01:00
Zoltan Fedor	9cf80ee07e	feat: add HA support for Weaviate (#3764 ) * feat: add HA support for Weaviate Adding the `replicationConfig => factor` parameter to the Weaviate class at the time of class creation, allowing the user to have Haystack create a Weaviate "Class" with a replication factor set above 1. This enables the use of Weaviate in a HA (High Availability) fashion, where the created class is stored on multiple Weaviate nodes increasing Weaviate's throughput and also ensuring high availability. * Trying out a recommendation from @masci to fix the CI issue	2023-01-12 10:01:38 +01:00
ZanSara	d157e41c1f	chore: enable `logging-fstring-interpolation` and cleanup (#3843 ) * enable logging-fstring-interpolation * remove logging-fstring-interpolation from exclusion list * remove implicit string interpolations added by black * remove from rest_api too * fix % sign	2023-01-12 09:31:21 +01:00
ZanSara	4cbc8550d6	chore: enable `trailing-whitespace` and cleanup (#3847 ) * enable trailing-whitespace * remove trailing whitespace on rest api too	2023-01-11 20:08:19 +01:00
Massimiliano Pippi	fa4404baa0	fix: ignore non-serializable params when hashing pipeline objects (#3842 ) * ignore non-serializable params when hashing pipeline objects * make tests more clear	2023-01-11 17:11:41 +01:00
Vladimir Blagojevic	ccda51fb43	proposal: Shaper pipeline component (#3784 ) * Add InputOutputShaper proposal * Add security section * Rename to Shaper, small additions * Rewording, rename contract_docs to concat	2023-01-11 18:50:12 +05:30
Bilge Yücel	88db75a419	feat: update the docker image for haystack-api service (#3835 )	2023-01-11 15:35:46 +03:00
Stefano Fiorucci	be31178892	fix: make the crawler runnable and testable on Windows (#3830 ) * fix crawler and try to run CI * more compact expression * try to fix * improve naming regex * revert regex * make test_url compatible wirh Windows * better conditional expression	2023-01-10 20:27:28 +01:00
Massimiliano Pippi	7f8910192e	list conventional commit types in the PR template (#3836 )	2023-01-10 18:24:51 +01:00
Julian Risch	0e42a9015e	fix: inconsistent batch_size parameter names in distillation (#3811 )	2023-01-10 11:38:21 +01:00
Tobias Wochinger	dea10a51d3	fix: gracefully handle `FileExistsError` during `Preprocessor` resource download (#3816 ) * fix: use temp path for downloading punkt resources * fix: gracefully handle file exists error during download	2023-01-10 11:22:49 +01:00
Vladimir Blagojevic	394c4895c7	fix: Add missing docstrings to PromptNode, PromptTemplate and PromptModel (#3821 ) Co-authored-by: agnieszka-m <amarzec13@gmail.com> Co-authored-by: sjrl <sjrl@users.noreply.github.com>	2023-01-10 10:26:20 +01:00
Zoltan Fedor	0288e1be76	bug: The `PromptNode` handles all parameters as lists without checking if they are in fact lists (#3820 )	2023-01-10 08:08:17 +01:00
Agnieszka Marzec	897e89c9b1	Docs: Update FAISSDocStore load and save descriptions (#3808 ) * Update load and save descriptions * Add reviewers' suggestions * Add Bilge's comment * Blackify * Update haystack/document_stores/faiss.py Co-authored-by: Bilge Yücel <bilgeyucel96@gmail.com> Co-authored-by: Bilge Yücel <bilgeyucel96@gmail.com>	2023-01-10 07:55:20 +01:00
Massimiliano Pippi	d728dc2210	refactor: remove haystack demo along with deprecated Dockerfiles (#3829 ) * remove haystack demo from the repo * remove install step from the action	2023-01-09 18:46:47 +01:00
Vladimir Blagojevic	fa78e2b0e4	refactor: Change PromptNode registered templates from per class to per instance (#3810 )	2023-01-09 15:57:04 +01:00
tstadel	6ca88bfd23	fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field (#3662 ) * fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field * fix pylint * add tests * fix mypy * fix merge * format * fix pylint * move tests to SearchEngineDocumentStoreTestAbstract * move missed constants * add mocked_document_store fixture to TestElasticsearchDocumentStore * fix mocked_document_store * fix get_all_documents tests for elasticsearch>=7.16 * fix tests * fix tests try 2	2023-01-09 11:58:23 +01:00
Sebastian	5b0b338175	fix: Ensure eval mode for TableReader model for predictions (#3743 ) * Adding model.eval() calls to prediction functions in table reader * Add unit test to check if model is set in train mode that inference time prediction still works.	2023-01-09 11:07:06 +01:00
Sebastian	659020fcac	fix: Convert table cells to strings for compatibility with TableReader (#3762 ) * Add table = table.astype(str) to make sure cells are converted into to strings to be compatible witht the TableReader * Turn more strings into ints * Make sure answer text is always a string.	2023-01-09 10:42:11 +01:00
Massimiliano Pippi	93b48bc334	fix if clause in job skip logic (#3825 )	2023-01-08 22:50:35 +05:30
Massimiliano Pippi	eb1881f38f	skip fossa check from forks (#3824 )	2023-01-08 16:50:11 +01:00
tstadel	4a0a054164	fix: linefeeds in custom_query (#3813 ) * fix linefeeds in custom_query * add double quote test case	2023-01-05 17:13:04 +01:00
Julian Risch	0c2d13f1b8	bug: skip validating empty embeddings (#3774 ) * skip validating empty embeddings * skip batches without embeddings to update * add unit test with mocked retriever	2023-01-05 15:13:57 +01:00
Sebastian	e84fae2894	Migrating to use native Pytorch AMP (#2827 ) * Started making changes to use native Pytorch AMP * Updated compute_loss functions to use torch.cuda.amp.autocast * Updating docstrings * Add use_amp to trainer_checkpoint * Removed mentions of apex and started to add the necessary warnings * Removing unused instances of use_amp variable * Added fast training test for FARMReader. Needed to add max_query_length as a parameter in FARMReader.__init__ and FARMReader.train * Make max_query_length optional in FARMReader.train * Update lg Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-01-05 09:14:28 +01:00
Leo	35e9ff26cc	fix: adjust max token size for openai ADA-v2 embeddings (#3793 ) * Adjust max token size for openai ADA-v2 embeddings * Added requested changes and corrected old seq len Apparently the limit for the older models is 2046 and not 2048, I included this change directly. See (https://beta.openai.com/docs/guides/embeddings/what-are-embeddings) to check.	2023-01-04 16:25:32 +01:00
Julian Risch	a2c160e7d8	bug: skip empty documents in reader (#3773 ) * skip empty documents * test eval_batch and account for tables	2023-01-03 15:50:14 +01:00
Bhoumik Shah	43328d2744	fix: Fixing launch_milvus by cd'ing to milvus_dir (#3795 ) Co-authored-by: Bhoumik Shah <bhoumis@amazon.com>	2023-01-03 14:08:47 +01:00
Fabian	e53cc2bc3f	fix(docker): Use IMAGE_NAME in api image (#3786 ) If you set the IMAGE_NAME variable, then the base image will use that name, but the api image would previously use a hardcoded `deepset/haystack` image name.	2023-01-03 12:26:26 +01:00
Bilge Yücel	434beebfb1	feat: Change `docker-compose.yml` file (#3673 ) * feat: Change `docker-compose.yml` file * Add `volumes` to read from the local `/pipelines` folder * Change the `PIPELINE_YAML_PATH` value and refer to the local `pipelines.haystack-pipeline.yml` * Change the elasticsearch image * Fix volume * Update readme to direct users to the new demos repository	2023-01-03 11:49:12 +03:00

... 35 36 37 38 39 ...

3597 Commits