3803 Commits

Author SHA1 Message Date
Silvano Cerza
53b77dda6c
Move tests for write_documents from DocumentStoreBaseTests to separate class (#6334) 2023-11-17 19:25:16 +01:00
Silvano Cerza
326f51df9d
Move tests for count_document from DocumentStoreBaseTests to separate class (#6332) 2023-11-17 18:00:11 +01:00
Stefano Fiorucci
68be0d7f2c
refactor: improve Document representation (#6333)
* new repr

* reno
2023-11-17 17:49:00 +01:00
Silvano Cerza
5184481e50
refactor: Remove unecessary method to compare list of Documents in DocumentStoreBaseTests (#6324)
* Change Document.__eq__ to compare all fields

* Remove unecessary method to compare list of Documents in DocumentStoreBaseTests
2023-11-17 17:03:16 +01:00
ZanSara
e888852aec
Standardize TextFileToDocument (#6232)
* simplify textfiletodocument

* fix error handling and tests

* stray print

* reno

* streams->sources

* reno

* feedback

* test

* fix tests
2023-11-17 15:39:39 +01:00
Silvano Cerza
c26a932423
Change preview tests to run all tests except integration ones (#6325) 2023-11-17 15:33:43 +01:00
ZanSara
dfc1d452bb
feat: upgrade canals to 0.10.1 (#6309)
* upgrade canals

* reno

* trigger preview e2e

* bump canals

* fix decorator

* fix test

* test factory

* tests inmemory

* tests writer

* test audio

* tests builders

* tests caching

* tests embedders

* tests converters

* tests generators

* tests rankers

* tests retrievers

* fix pipeline and telemetry tests

* remove trigger
2023-11-17 14:46:23 +01:00
Vladimir Blagojevic
21bcfe76fb
Convert function call JSON payload to str (#6277) 2023-11-17 14:45:15 +01:00
Stefano Fiorucci
dd6e35d675
build: upgrade to transformers==4.35.2 (#6322)
* upgrade transformers to 4.35.2

* reno
2023-11-17 10:12:34 +01:00
Julian Risch
1c85e44156
test: Add langdetect installation to e2e tests (#6327)
* Add langdetect installation to e2e tests

* compare doc content and id only
2023-11-17 10:12:05 +01:00
Silvano Cerza
6dda6e5b2d
Change Document.__eq__ to compare all fields (#6323) 2023-11-16 17:17:43 +01:00
Massimiliano Pippi
ff3165b8b8
fix: fix un-flattening of metadata (#6318)
* fix un-flattening of metadata

* test should pass

* add relnote

* change policy: raise an error if both meta and keys are passed

* Update document.py

* support python 3.8

* adjust wording in the error message
2023-11-16 17:10:53 +01:00
Julian Risch
34ecff1d19
build: Upgrade openai-whisper and re-introduce audio extra (#6319)
* upgrade openai-whisper and re-introduce audio extra

* add audio extra to
2023-11-16 15:04:50 +01:00
Julian Risch
8b092a90c0
test: Add MetadataRouter to preprocessing pipeline in e2e test (#6321)
* add MetadataRouter to preprocessing pipeline

* replace mimetype check with language check
2023-11-16 11:22:37 +01:00
x110
c4cfe6cb90
fix: Load additional fields from SQUAD-format file to meta field for labels #5978 (#6301)
* Load additional fields from SQUAD-format file to meta field for labels

* added a test function

* rewritten test using pytest

* added release notes

* improve release note

* clean up test

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-16 10:44:51 +01:00
Stefano Fiorucci
c691412652
chore: improve the error messages of LazyImport (#6316)
* improve lazy import error messages

* revert changes to adaptive_model
2023-11-16 09:02:12 +01:00
Agnieszka Marzec
414cbcfd92
Update docstrings (#6297) 2023-11-15 17:54:10 +01:00
Stefano Fiorucci
f74f034549
fix pydoc config (#6313) 2023-11-15 15:32:29 +01:00
Vivek Silimkhan
f998bf4a4f
feat: add Amazon Bedrock support (#6226)
* Add Bedrock

* Update supported models for Bedrock

* Fix supports and add extract response in Bedrock

* fix errors imports

* improve and refactor supports

* fix install

* fix mypy

* fix pylint

* fix existing tests

* Added Anthropic Bedrock

* fix tests

* fix sagemaker tests

* add default prompt handler, constructor and supports tests

* more tests

* invoke refactoring

* refactor model_kwargs

* fix mypy

* lstrip responses

* Add streaming support

* bump boto3 version

* add class docstrings, better exception names

* fix layer name

* add tests for anthropic and cohere model adapters

* update cohere params

* update ai21 args and add tests

* support cohere command light model

* add tital tests

* better class names

* support meta llama 2 model

* fix streaming support

* more future-proof model adapter selection

* fix import

* fix mypy

* fix pylint for preview

* add tests for streaming

* add release notes

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* fix format

* fix tests after msg changes

* fix streaming for cohere

---------

Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
Co-authored-by: tstadel <thomas.stadelmann@deepset.ai>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2023-11-15 13:26:29 +01:00
Julian Risch
08ec492039
refactor!: Remove routing from DocumentLanguageClassifier and rename TextLanguageClassifier (#6307)
* remove routing from DocumentLanguageClassifier

* fix MetadataRouter typo
2023-11-15 13:10:07 +01:00
Julian Risch
5295b40def
docs: Reader returns top_k+1 answers if no_answer is enabled 2023-11-15 10:20:21 +01:00
Julian Risch
807cd6d139
chore: Add MetaFieldRanker to rankers __init__.py 2023-11-14 13:53:58 +01:00
Ashwin Mathur
4e4d5eb3e2
feat!: Remove unused query parameter from MetaFieldRanker (#6300)
* Remove unused query parameter from MetaFieldRanker

* Add release notes
2023-11-14 12:33:38 +01:00
Daria Fokina
34136382c1
docs: 2.0 API reference (#6262)
* docs: 2.0 API reference

* add builders and generators

* classifiers file path
2023-11-14 10:12:28 +01:00
Tuana Çelik
b8fdb880f9
Update docstring in html.py (#6279)
The explanation of 'sources' is inadequate especially because this is probably going to be most used with `LinkContentFetcher` that returns `List[ByteStream]`
2023-11-13 12:32:25 +01:00
Stefano Fiorucci
f708cf6056
refactor!: set scale_score default value to False (#6276)
* set default scale_score to False

* release note
2023-11-13 11:59:18 +01:00
Silvano Cerza
8e7ce208fc
Fix Document init when passing non existing fields (#6286)
* Fix Document init when passing non existing fields

* Update releasenotes/notes/fix-document-init-09c1cbb14202be7d.yaml

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Fix linting

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-11-13 11:42:42 +01:00
Tuana Çelik
bf637e9c7e
Update transformers_similarity.py (#6280)
Fixing the Document examples
2023-11-13 09:35:00 +01:00
Stefano Fiorucci
92a8704de4
mypy ignore specific errors (#6278) 2023-11-10 18:10:38 +01:00
Massimiliano Pippi
1b63cfc8b3
fix: make types work without installing pypdf (#6269)
* make types work without installing pypdf

* make pylint happy, keep pyright happy, hope mypy doesn't care
2023-11-09 20:02:22 +01:00
Vladimir Blagojevic
b4d8d1c904
feat: Add custom conversion callable to PyPDFToDocument - Haystack 2.x (#6258)
* Allow user specified converter hook

* Add a release note

* More unit tests

* PR review - Massi, use protocol as converter
2023-11-09 17:35:33 +01:00
Agnieszka Marzec
1046bebbe0
Docs: Update docstrings lg (#6260)
* Update docstrings lg

* Update test_in_memory_bm25_retriever.py

* Update test_in_memory_embedding_retriever.py

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-11-09 17:34:52 +01:00
Tuana Çelik
3be6ec7840
Update openai.py (#6263) 2023-11-09 17:33:35 +01:00
Silvano Cerza
73e2843cf1
Fix deprecation warning when calling Document.from_dict() (#6267) 2023-11-09 16:50:06 +01:00
Stefano Fiorucci
f95937b0ce
chore: move HuggingFaceLocalGenerator to the generators directory (#6264)
* move HuggingFaceLocalGenerator to right directory

* fix tests
2023-11-09 15:59:23 +01:00
Stefano Fiorucci
2b3c77e41d
fix: make JoinDocuments correctly handle duplicate documents w null scores (#6261)
* fix error with null values

* release note

* simplify
2023-11-09 14:28:56 +01:00
Domenico
676da681d0
feat: MetaField Ranker (#6189)
* proposal: meta field ranker

* Apply suggestions from code review

Co-authored-by: ZanSara <sarazanzo94@gmail.com>

* update proposal filename

* feat: add metafield ranker

* fix docstrings

* remove proposal file from pr

* add release notes

* update code according to new Document class

* separate loops for each ranking mode in __merge_scores

* change error type in init and new tests for linear score warning

* docstring upd

---------

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-11-09 12:20:41 +01:00
Sebastian Husch Lee
71d0d92ea2
feat: Add model_kwargs to ExtractiveReader to impact model loading (#6257)
* Add ability to pass model_kwargs to AutoModelForQuestionAnswering

* Add testing for new model_kwargs

* Add spacing

* Add release notes

* Update haystack/preview/components/readers/extractive.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

* Make changes suggested by Stefano

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-09 11:25:22 +01:00
Vladimir Blagojevic
cd429a73cd
feat: Add GPTChatGenerator to Haystack 2.x (#6212)
* Add GPTChatGenerator

* Apply lessons from previous PR

* PR review - Stefano
2023-11-09 10:45:41 +01:00
Daria Fokina
08e211f9d6
docs: fix whisper_local indentations docstrings (#6209)
* whisper_local indentations

* Update whisper_local.py

* fix param docstrings

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-11-08 18:15:39 +01:00
Stefano Fiorucci
72cbf3ee0b
fix: replace haystack.lazy_imports with haystack.preview.lazy_imports (#6255)
* lazy import transformers in tgi

* fix pylint

* fix wrong import
2023-11-08 17:33:07 +01:00
Massimiliano Pippi
f019896335
ci: Generate release notes in a Github workflow (#6211)
* first try

* Update config.yaml

* Update github_release.yml

* set the rc0 tag more explicitly
2023-11-08 12:29:37 +01:00
jambudipa
2f118e857c
feat: add tokenization details for gpt-4-1106-preview (#6250)
* feat: add tokenization details for gpt-4-1106-preview

* update max_tokens value

* reno

---------

Co-authored-by: jambudipa <mark.norgate@ext.ons.gov.uk>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-11-08 12:04:08 +01:00
Massimiliano Pippi
58e357148e
ci: tag when branching off for a release (#6206)
* tag when branching off

* change minor bump workflow

* Update .github/workflows/minor_version_release.yml

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update minor_version_release.yml

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-11-08 11:06:45 +01:00
Silvano Cerza
bf884094d1
refactor: Change Document.blob type and remove mime_type field (#6249)
* Change Document.blob type and remove mime_type field

* Add release notes

* Remove mime_type from Document docstring
2023-11-08 10:35:17 +01:00
Stefano Fiorucci
e2881e2ad3
fix: lazy import transformers in TGI Generators (#6252)
* lazy import transformers in tgi

* fix pylint
2023-11-08 00:09:42 +01:00
Vladimir Blagojevic
5497ca2a45
feat: Adapt GPTGenerator to use str input/output format in Haystack 2.x (#6214)
* Adapt GPTGenerator to string input/output

* Finishing touches

* punctuation upd

* PR feedback

* Small naming fixes

* Update haystack/preview/components/generators/openai.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

* Update class pydoc with a printed response

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-11-07 18:00:43 +01:00
Silvano Cerza
6c5bfe3da4
Update README links and badges (#6248) 2023-11-07 16:53:17 +01:00
Stefano Fiorucci
9b76acb165
pin openai<1 (#6244) 2023-11-06 18:11:41 +01:00
Stefano Fiorucci
982ac3df01
fix: fix failing e2e test (after moving classifiers) (#6243)
* mv classifiers

* release note

* fix e2e test
2023-11-06 17:08:20 +01:00