3803 Commits

Author SHA1 Message Date
Sebastian
e84fae2894
Migrating to use native Pytorch AMP (#2827)
* Started making changes to use native Pytorch AMP

* Updated compute_loss functions to use torch.cuda.amp.autocast

* Updating docstrings

* Add use_amp to trainer_checkpoint

* Removed mentions of apex and started to add the necessary warnings

* Removing unused instances of use_amp variable

* Added fast training test for FARMReader. Needed to add max_query_length as a parameter in FARMReader.__init__ and FARMReader.train

* Make max_query_length optional in FARMReader.train

* Update lg

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-01-05 09:14:28 +01:00
Leo
35e9ff26cc
fix: adjust max token size for openai ADA-v2 embeddings (#3793)
* Adjust max token size for openai ADA-v2 embeddings

* Added requested changes and corrected old seq len

Apparently the limit for the older models is 2046 and not 2048, I included this change directly. 
See (https://beta.openai.com/docs/guides/embeddings/what-are-embeddings) to check.
2023-01-04 16:25:32 +01:00
Julian Risch
a2c160e7d8
bug: skip empty documents in reader (#3773)
* skip empty documents

* test eval_batch and account for tables
2023-01-03 15:50:14 +01:00
Bhoumik Shah
43328d2744
fix: Fixing launch_milvus by cd'ing to milvus_dir (#3795)
Co-authored-by: Bhoumik Shah <bhoumis@amazon.com>
2023-01-03 14:08:47 +01:00
Fabian
e53cc2bc3f
fix(docker): Use IMAGE_NAME in api image (#3786)
If you set the IMAGE_NAME variable, then the base image will use that name,
but the api image would previously use a hardcoded `deepset/haystack` image name.
2023-01-03 12:26:26 +01:00
Bilge Yücel
434beebfb1
feat: Change docker-compose.yml file (#3673)
* feat: Change `docker-compose.yml` file

* Add `volumes` to read from the local `/pipelines` folder
* Change the `PIPELINE_YAML_PATH` value and refer to the local `pipelines.haystack-pipeline.yml`
* Change the elasticsearch image

* Fix volume

* Update readme to direct users to the new demos repository
2023-01-03 11:49:12 +03:00
Julian Risch
b155297a06
feat: change PipelineConfigError to DocumentStoreError with more details (#3783) 2023-01-02 19:40:45 +01:00
Massimiliano Pippi
19c7725319
feat: utility function to explicitly invoke JSON schema generation (#3798)
* explicitly cache the JSON schema

* fix import path

* move to final
2023-01-02 17:06:24 +01:00
Vladimir Blagojevic
bebd6b26ec
Improve robustness of PromptNode unit tests (#3747) 2023-01-02 16:28:56 +01:00
Massimiliano Pippi
c16bbee046
pin protobuf version (#3789) 2022-12-30 21:39:01 +05:30
Bilge Yücel
ddba75021a
fix: add additional settings to OpenAPI schema (#3788)
* "proxy-enabled": disable CORS proxy
* "samples-languages": display two languages initially
2022-12-30 16:10:37 +03:00
Vladimir Blagojevic
19e9b06b4e
feat: Bump python to 3.10 for gpu docker image, use nvidia/cuda (#3701)
* Update pytorch base image

* Small corrections

* Revert back to load_schema() call

* reverted to import haystack for schema generation

Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2022-12-30 16:04:27 +05:30
Sebastian
ae98961b74
Changed opening of files to use with open to make sure files are explicitly closed outside of the with context. (#3787) 2022-12-29 17:59:10 +01:00
bogdankostic
36cfd41713
Add newline when generating OpenAPI specs (#3782) 2022-12-29 17:55:43 +01:00
Ivan Lopez
3e90b5f29c
fix: Trigger pipeline schema update on tagged releases (#3752)
* ci: trigger schema update after docker image release

* fix: use HAYSTACK_BOT_TOKEN secret in pipeline_schema workflow
2022-12-29 14:59:58 +01:00
bogdankostic
594d2a10f8
fix: Fix predict_batch in TransformersReader for single nested Document list (#3748)
* Fix restoring of list structure

* Add tests
2022-12-29 11:48:18 +01:00
Stefano Fiorucci
136928714c
refactor: remove deprecated parameters from Summarizer (#3740)
* remove deprecated parameters

* remove deprecation/removal test
2022-12-29 15:37:47 +05:30
Agnieszka Marzec
b8fff837b4
docs: Add info where the feedback is stored (#3772)
* Add info where the feedback is stored

* Fix misplaced line breaks

* Generate OpenAPI Specs

* Generate OpenAPI Specs

* Apply black

* Generate OpenAPI specs

* Add missing whitespace

Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-12-28 14:46:26 +01:00
Bilge Yücel
86ade4817e
bug: fix the docs rest api reference url (#3775)
* bug: fix the docs rest api reference url

* revert openapi json changes

* remove last line on json files

* Add explanation about `servers` and remove `servers` parameter from FastAPI

* generate openapi schema without empty end line
2022-12-28 12:30:58 +03:00
Vladimir Blagojevic
890e2bf0f5
feat: Run commands inside docker container as a non root user (#3702) 2022-12-27 21:36:42 +01:00
Julian Risch
03619d2e00
change default sklearn models to new ones (#3777) 2022-12-28 01:37:39 +05:30
tstadel
6c067b2b4f
feat: make score_script first class citizen via knn_engine param (#3284)
* OpenSearchDocumentStore: make score_script accessible via knn_engine

* blacken

* fix tests

* fix format

* fix naming of 'score_script' consistently

* fix tests

* fix test

* fix ef_search tests

* always validate index

* improve clone_embedding_field

* fix pylint

* reformat

* remove port

* update tests

* set no_implicit_optional = false

* fix myp

* fix test

* refactorings

* reformat

* fix and refactor tests

* better tests

* create search_field mappings

* remove no_implicit_optional = false

* skip validation for custom mapping

* format

* Apply suggestions from docs code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply tougher suggestions from code review

* fix messages

* fix typos

* update tests

* Update haystack/document_stores/opensearch.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* fix tests

* fix ef_search validation

* add test for ef_search nmslib

* fix assert_not_called

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-12-27 15:24:31 +01:00
Mayank Jobanputra
76a16807d5
fix: Fixed local reader model loading (#3663)
* Fixed local loading issue
2022-12-24 03:46:36 +05:30
Massimiliano Pippi
450c3d4484
fix: build pdftotext from sources (#3746)
* build pdftotext from sources

* trigger the build on my own PR - to be reverted

* trigger the build on my own PR - to be reverted

* Update docker_release.yml
2022-12-22 18:37:36 +01:00
Agnieszka Marzec
367c63ef1d
Update readme (#3744) 2022-12-22 15:53:48 +01:00
Massimiliano Pippi
2904587d4f
proposal: Create a dedicated Github repository for Haystack demos (#3695)
* first draft

* add PR number and motivations

* mention HSH

* review feedback

* Update 3695-demo-repository.md
2022-12-22 10:09:46 +01:00
Tobias Wochinger
33c480286a
ci: add license compliance check (#3221)
* ci: add license compliance check

* ci: run check always for testing purposes

* revamp workflows

* temporary remove path directive

* triggering ci

* check rest api and ui too

* avoid cache to make sure env is clean

* add shield on readme

* ci: trigger CI to get latest scan

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-12-22 10:08:26 +01:00
Tuana Celik
fe5e0164e8
chore: adding template for prompt node (#3738) 2022-12-21 20:13:57 +01:00
bogdankostic
e266cf6e29
fix: Make InferenceProcessor thread safe (#3709)
* Make TextClassificationProcessor thread-safe by removing self.baskets

* Add print statement for debugging

* Remove print statement for debugging

* Fix mypy
2022-12-21 18:08:41 +01:00
Sebastian
756e0114e6
refactor: Remove duplicate code in TableReader (#3708)
* Refactor table reader to use util functions to reduce code duplication.

* Expanding the tests for the table reader

* Adding types

* Updating tests to work for RCIReader

* Fix bug in RCIReader. Saving the wrong queries list.

* Update _flatten_inputs to not change input variable

* Remove duplicate code
2022-12-21 14:33:19 +01:00
bogdankostic
12c264603e
fix: Fix number of concurrent requests in RequestLimiter (#3705) 2022-12-21 11:40:33 +01:00
Stefano Fiorucci
82ad408a74
refactor: remove unused code in TfidfRetriever (#3733) 2022-12-20 17:51:46 +01:00
Vladimir Blagojevic
9ebf164cfd
feat: Expand LLM support with PromptModel, PromptNode, and PromptTemplate (#3667)
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-12-20 11:21:26 +01:00
Stefano Fiorucci
559f6e0569
better compatibility with different versions of sklearn (#3732) 2022-12-20 09:59:36 +01:00
Zoltan Fedor
e143f7cc36
Fixing broken BM25 support with Weaviate - fixes #3720 (#3723)
* Fixing broken BM25 support with Weaviate - fixes #3720

Unfortunately the BM25 support with Weaviate got broken with Haystack v1.11.0+, which is getting fixed with this commit.

Please see more under issue #3720.

* Fixing mypy issue - method signature wasn't matching the base class

* Mypy related test fix

Mypy forced me to set the signature of the `query` method of the Weaviate document store to the same as its parent, the `KeywordDocumentStore`, where the `query` parame is `Optional`, but has NO default value, so it must be provided (as None) at runtime.
I am not quite sure why the abstract method's `query` param was set without a default value while its type is `Optional`, but I didn't want to change that, so instead I have changed the Weaviate tests.

* Adding a note regarding an upcomming fix in Weaviate v1.17.0

* Apply suggestions from code review

* revert

* [EMPTY] Re-trigger CI
2022-12-19 17:24:46 +01:00
Vladimir Blagojevic
56803e5465
feat: Enable text-embedding-ada-002 for EmbeddingRetriever (#3721)
* Enable text-embedding-ada-002 for EmbeddingRetriever

* Easier to understand code, more unit tests
2022-12-19 17:06:48 +01:00
Massimiliano Pippi
8edfd8978e
Update the proposals process (#3718)
* update the proposals process

* add stalebot to manage proposals lifecycle

* typo

* Update 0000-template.md

* clarify PR labelling staying away from implementation details
2022-12-19 14:35:07 +01:00
Sebastian
d7fabb569b
feat: Use torch.inference_mode() for TableQA (#3731)
* Update to make inference_mode work in TableQA

* Update variable names

* Added torch.inference_mode() for the RCIReader model forward passes
2022-12-19 13:07:07 +01:00
Stefano Fiorucci
5b9c661155
feat: add index parameter to TfidfRetriever (#3666)
* first draft to add index param to tfidf

* better mypy handling

* Revert "better mypy handling"

This reverts commit 91a22516320f9dcbeae53827ec69f9dc51e1785c.

* new check in auto_fit

* new check also in retrieve

* better dict typings

* new test and improvements to other test

* remove unnecessary lambda

* improve test

* remove newline from openapi json

* fix test

* language fix

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* language fix 2

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* language fix 3

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* language fix 4

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* language fix 5

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* language fix 6

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* explicit index value handling

* fix test

* better error messages

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-12-19 12:07:49 +01:00
Agnieszka Marzec
a1d8557c80
Update the readme action version (#3726)
* Update the readme action version

Updated the rdme action version to the latest one.

* Update the version
2022-12-19 10:23:05 +01:00
Zoltan Fedor
3990697869
Fixing the query_batch method of the deepsetcloud document store - … (#3724)
* Fixing the `query_batch` method of the deepsetcloud document store - fixes #3722

* Trigger Build

* Trigger Build

* Trigger CI

Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-12-19 09:57:26 +01:00
Julian Risch
adde194b04
build: upgrade torch and let transformers pick the version (#3727)
* test torch 1.13.1 release

* let transformers handle torch version
2022-12-16 21:33:01 +05:30
Vladimir Blagojevic
42926596e4
Update cohere embedding models (#3704) 2022-12-16 16:49:59 +01:00
Sebastian
4afdbc33b2
fix: Removed overlooked torch scatter references (#3719)
* Removed torch scatter references

* Add back /
2022-12-16 10:36:19 +01:00
Vladimir Blagojevic
c69222faf4
Add PromptNode proposal (#3665) 2022-12-16 10:27:58 +01:00
Agnieszka Marzec
a23f425877
Fix lg (#3725) 2022-12-16 09:43:22 +01:00
Sebastian
54bf7ad343
Remove && \ from end of line (#3710) 2022-12-13 21:29:18 +05:30
Sebastian
d0f786af9f
feat: Bump transformers version to remove torch scatter dependency (#3703)
* Bump transformers version so we can remove torch scatter dependency

* manual re-merge

Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2022-12-13 18:33:07 +05:30
Sara Zan
f24cbdbb5d
remove beir from the base GPU image (#3692) 2022-12-13 11:11:58 +01:00
Stefano Fiorucci
e1401f79b6
refactor: improve Multilabel design (#3658)
* first try and new test

* fix test

* fix unused import

* remove comments

* no more dataclass

* add __eq__ and extend test

* better design from review

* Update schema.py

* fix black

* fix openapi

* fix openapi 2

* new try to fix openapi

* remove newline from openapi json
2022-12-13 10:45:56 +01:00