haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-20 07:21:09 +00:00

Author	SHA1	Message	Date
Sebastian	e84fae2894	Migrating to use native Pytorch AMP (#2827 ) * Started making changes to use native Pytorch AMP * Updated compute_loss functions to use torch.cuda.amp.autocast * Updating docstrings * Add use_amp to trainer_checkpoint * Removed mentions of apex and started to add the necessary warnings * Removing unused instances of use_amp variable * Added fast training test for FARMReader. Needed to add max_query_length as a parameter in FARMReader.__init__ and FARMReader.train * Make max_query_length optional in FARMReader.train * Update lg Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-01-05 09:14:28 +01:00
Leo	35e9ff26cc	fix: adjust max token size for openai ADA-v2 embeddings (#3793 ) * Adjust max token size for openai ADA-v2 embeddings * Added requested changes and corrected old seq len Apparently the limit for the older models is 2046 and not 2048, I included this change directly. See (https://beta.openai.com/docs/guides/embeddings/what-are-embeddings) to check.	2023-01-04 16:25:32 +01:00
Julian Risch	a2c160e7d8	bug: skip empty documents in reader (#3773 ) * skip empty documents * test eval_batch and account for tables	2023-01-03 15:50:14 +01:00
Bhoumik Shah	43328d2744	fix: Fixing launch_milvus by cd'ing to milvus_dir (#3795 ) Co-authored-by: Bhoumik Shah <bhoumis@amazon.com>	2023-01-03 14:08:47 +01:00
Fabian	e53cc2bc3f	fix(docker): Use IMAGE_NAME in api image (#3786 ) If you set the IMAGE_NAME variable, then the base image will use that name, but the api image would previously use a hardcoded `deepset/haystack` image name.	2023-01-03 12:26:26 +01:00
Bilge Yücel	434beebfb1	feat: Change `docker-compose.yml` file (#3673 ) * feat: Change `docker-compose.yml` file * Add `volumes` to read from the local `/pipelines` folder * Change the `PIPELINE_YAML_PATH` value and refer to the local `pipelines.haystack-pipeline.yml` * Change the elasticsearch image * Fix volume * Update readme to direct users to the new demos repository	2023-01-03 11:49:12 +03:00
Julian Risch	b155297a06	feat: change PipelineConfigError to DocumentStoreError with more details (#3783 )	2023-01-02 19:40:45 +01:00
Massimiliano Pippi	19c7725319	feat: utility function to explicitly invoke JSON schema generation (#3798 ) * explicitly cache the JSON schema * fix import path * move to final	2023-01-02 17:06:24 +01:00
Vladimir Blagojevic	bebd6b26ec	Improve robustness of PromptNode unit tests (#3747 )	2023-01-02 16:28:56 +01:00
Massimiliano Pippi	c16bbee046	pin protobuf version (#3789 )	2022-12-30 21:39:01 +05:30
Bilge Yücel	ddba75021a	fix: add additional settings to OpenAPI schema (#3788 ) * "proxy-enabled": disable CORS proxy * "samples-languages": display two languages initially	2022-12-30 16:10:37 +03:00
Vladimir Blagojevic	19e9b06b4e	feat: Bump python to 3.10 for gpu docker image, use nvidia/cuda (#3701 ) * Update pytorch base image * Small corrections * Revert back to load_schema() call * reverted to import haystack for schema generation Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2022-12-30 16:04:27 +05:30
Sebastian	ae98961b74	Changed opening of files to use with open to make sure files are explicitly closed outside of the with context. (#3787 )	2022-12-29 17:59:10 +01:00
bogdankostic	36cfd41713	Add newline when generating OpenAPI specs (#3782 )	2022-12-29 17:55:43 +01:00
Ivan Lopez	3e90b5f29c	fix: Trigger pipeline schema update on tagged releases (#3752 ) * ci: trigger schema update after docker image release * fix: use HAYSTACK_BOT_TOKEN secret in pipeline_schema workflow	2022-12-29 14:59:58 +01:00
bogdankostic	594d2a10f8	fix: Fix `predict_batch` in `TransformersReader` for single nested Document list (#3748 ) * Fix restoring of list structure * Add tests	2022-12-29 11:48:18 +01:00
Stefano Fiorucci	136928714c	refactor: remove deprecated parameters from `Summarizer` (#3740 ) * remove deprecated parameters * remove deprecation/removal test	2022-12-29 15:37:47 +05:30
Agnieszka Marzec	b8fff837b4	docs: Add info where the feedback is stored (#3772 ) * Add info where the feedback is stored * Fix misplaced line breaks * Generate OpenAPI Specs * Generate OpenAPI Specs * Apply black * Generate OpenAPI specs * Add missing whitespace Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-12-28 14:46:26 +01:00
Bilge Yücel	86ade4817e	bug: fix the docs rest api reference url (#3775 ) * bug: fix the docs rest api reference url * revert openapi json changes * remove last line on json files * Add explanation about `servers` and remove `servers` parameter from FastAPI * generate openapi schema without empty end line	2022-12-28 12:30:58 +03:00
Vladimir Blagojevic	890e2bf0f5	feat: Run commands inside docker container as a non root user (#3702 )	2022-12-27 21:36:42 +01:00
Julian Risch	03619d2e00	change default sklearn models to new ones (#3777 )	2022-12-28 01:37:39 +05:30
tstadel	6c067b2b4f	feat: make `score_script` first class citizen via `knn_engine` param (#3284 ) * OpenSearchDocumentStore: make score_script accessible via knn_engine * blacken * fix tests * fix format * fix naming of 'score_script' consistently * fix tests * fix test * fix ef_search tests * always validate index * improve clone_embedding_field * fix pylint * reformat * remove port * update tests * set no_implicit_optional = false * fix myp * fix test * refactorings * reformat * fix and refactor tests * better tests * create search_field mappings * remove no_implicit_optional = false * skip validation for custom mapping * format * Apply suggestions from docs code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply tougher suggestions from code review * fix messages * fix typos * update tests * Update haystack/document_stores/opensearch.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * fix tests * fix ef_search validation * add test for ef_search nmslib * fix assert_not_called Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-12-27 15:24:31 +01:00
Mayank Jobanputra	76a16807d5	fix: Fixed local reader model loading (#3663 ) * Fixed local loading issue	2022-12-24 03:46:36 +05:30
Massimiliano Pippi	450c3d4484	fix: build `pdftotext` from sources (#3746 ) * build pdftotext from sources * trigger the build on my own PR - to be reverted * trigger the build on my own PR - to be reverted * Update docker_release.yml	2022-12-22 18:37:36 +01:00
Agnieszka Marzec	367c63ef1d	Update readme (#3744 )	2022-12-22 15:53:48 +01:00
Massimiliano Pippi	2904587d4f	proposal: Create a dedicated Github repository for Haystack demos (#3695 ) * first draft * add PR number and motivations * mention HSH * review feedback * Update 3695-demo-repository.md	2022-12-22 10:09:46 +01:00
Tobias Wochinger	33c480286a	ci: add license compliance check (#3221 ) * ci: add license compliance check * ci: run check always for testing purposes * revamp workflows * temporary remove path directive * triggering ci * check rest api and ui too * avoid cache to make sure env is clean * add shield on readme * ci: trigger CI to get latest scan Co-authored-by: ZanSara <sarazanzo94@gmail.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-12-22 10:08:26 +01:00
Tuana Celik	fe5e0164e8	chore: adding template for prompt node (#3738 )	2022-12-21 20:13:57 +01:00
bogdankostic	e266cf6e29	fix: Make `InferenceProcessor` thread safe (#3709 ) * Make TextClassificationProcessor thread-safe by removing self.baskets * Add print statement for debugging * Remove print statement for debugging * Fix mypy	2022-12-21 18:08:41 +01:00
Sebastian	756e0114e6	refactor: Remove duplicate code in TableReader (#3708 ) * Refactor table reader to use util functions to reduce code duplication. * Expanding the tests for the table reader * Adding types * Updating tests to work for RCIReader * Fix bug in RCIReader. Saving the wrong queries list. * Update _flatten_inputs to not change input variable * Remove duplicate code	2022-12-21 14:33:19 +01:00
bogdankostic	12c264603e	fix: Fix number of concurrent requests in RequestLimiter (#3705 )	2022-12-21 11:40:33 +01:00
Stefano Fiorucci	82ad408a74	refactor: remove unused code in `TfidfRetriever` (#3733 )	2022-12-20 17:51:46 +01:00
Vladimir Blagojevic	9ebf164cfd	feat: Expand LLM support with PromptModel, PromptNode, and PromptTemplate (#3667 ) Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2022-12-20 11:21:26 +01:00
Stefano Fiorucci	559f6e0569	better compatibility with different versions of sklearn (#3732 )	2022-12-20 09:59:36 +01:00
Zoltan Fedor	e143f7cc36	Fixing broken BM25 support with Weaviate - fixes #3720 (#3723 ) * Fixing broken BM25 support with Weaviate - fixes #3720 Unfortunately the BM25 support with Weaviate got broken with Haystack v1.11.0+, which is getting fixed with this commit. Please see more under issue #3720. * Fixing mypy issue - method signature wasn't matching the base class * Mypy related test fix Mypy forced me to set the signature of the `query` method of the Weaviate document store to the same as its parent, the `KeywordDocumentStore`, where the `query` parame is `Optional`, but has NO default value, so it must be provided (as None) at runtime. I am not quite sure why the abstract method's `query` param was set without a default value while its type is `Optional`, but I didn't want to change that, so instead I have changed the Weaviate tests. * Adding a note regarding an upcomming fix in Weaviate v1.17.0 * Apply suggestions from code review * revert * [EMPTY] Re-trigger CI	2022-12-19 17:24:46 +01:00
Vladimir Blagojevic	56803e5465	feat: Enable text-embedding-ada-002 for EmbeddingRetriever (#3721 ) * Enable text-embedding-ada-002 for EmbeddingRetriever * Easier to understand code, more unit tests	2022-12-19 17:06:48 +01:00
Massimiliano Pippi	8edfd8978e	Update the proposals process (#3718 ) * update the proposals process * add stalebot to manage proposals lifecycle * typo * Update 0000-template.md * clarify PR labelling staying away from implementation details	2022-12-19 14:35:07 +01:00
Sebastian	d7fabb569b	feat: Use torch.inference_mode() for TableQA (#3731 ) * Update to make inference_mode work in TableQA * Update variable names * Added torch.inference_mode() for the RCIReader model forward passes	2022-12-19 13:07:07 +01:00
Stefano Fiorucci	5b9c661155	feat: add `index` parameter to `TfidfRetriever` (#3666 ) * first draft to add index param to tfidf * better mypy handling * Revert "better mypy handling" This reverts commit 91a22516320f9dcbeae53827ec69f9dc51e1785c. * new check in auto_fit * new check also in retrieve * better dict typings * new test and improvements to other test * remove unnecessary lambda * improve test * remove newline from openapi json * fix test * language fix Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 2 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 3 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 4 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 5 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 6 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * explicit index value handling * fix test * better error messages Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-12-19 12:07:49 +01:00
Agnieszka Marzec	a1d8557c80	Update the readme action version (#3726 ) * Update the readme action version Updated the rdme action version to the latest one. * Update the version	2022-12-19 10:23:05 +01:00
Zoltan Fedor	3990697869	Fixing the `query_batch` method of the deepsetcloud document store - … (#3724 ) * Fixing the `query_batch` method of the deepsetcloud document store - fixes #3722 * Trigger Build * Trigger Build * Trigger CI Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>	2022-12-19 09:57:26 +01:00
Julian Risch	adde194b04	build: upgrade torch and let transformers pick the version (#3727 ) * test torch 1.13.1 release * let transformers handle torch version	2022-12-16 21:33:01 +05:30
Vladimir Blagojevic	42926596e4	Update cohere embedding models (#3704 )	2022-12-16 16:49:59 +01:00
Sebastian	4afdbc33b2	fix: Removed overlooked torch scatter references (#3719 ) * Removed torch scatter references * Add back /	2022-12-16 10:36:19 +01:00
Vladimir Blagojevic	c69222faf4	Add PromptNode proposal (#3665 )	2022-12-16 10:27:58 +01:00
Agnieszka Marzec	a23f425877	Fix lg (#3725 )	2022-12-16 09:43:22 +01:00
Sebastian	54bf7ad343	Remove && \ from end of line (#3710 )	2022-12-13 21:29:18 +05:30
Sebastian	d0f786af9f	feat: Bump transformers version to remove torch scatter dependency (#3703 ) * Bump transformers version so we can remove torch scatter dependency * manual re-merge Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2022-12-13 18:33:07 +05:30
Sara Zan	f24cbdbb5d	remove beir from the base GPU image (#3692 )	2022-12-13 11:11:58 +01:00
Stefano Fiorucci	e1401f79b6	refactor: improve Multilabel design (#3658 ) * first try and new test * fix test * fix unused import * remove comments * no more dataclass * add __eq__ and extend test * better design from review * Update schema.py * fix black * fix openapi * fix openapi 2 * new try to fix openapi * remove newline from openapi json	2022-12-13 10:45:56 +01:00

... 40 41 42 43 44 ...

3803 Commits