Silvano Cerza
76165d024f
Fix corner cases and error handling with filters conversion ( #6376 )
2023-11-21 18:22:48 +01:00
Stefano Fiorucci
456902235a
feat: make DocumentWriter
return the actual number of documents written ( #6366 )
...
* make DocumentWriter return the actual number of documents written
* add/improve tests
2023-11-21 15:54:25 +01:00
Daniel Fleischer
0cef17ac13
feat: embedding instructions for dense retrieval ( #6372 )
...
* Embedding instructions in EmbeddingRetriever
Query and documents embeddings are prefixed with instructions, useful
for retrievers finetuned on specific tasks, such as Q&A.
* Tests
Checking vectors 0th component vs. reference, using different stores.
* Normalizing vectors
* Release notes
2023-11-21 12:56:40 +01:00
Silvano Cerza
a7f742fdbd
refactor: Rename docstore
fixture to document_store
( #6360 )
...
* Prevent pytest_generate_tests from polluting preview tests
* Rename docstore fixture to document_store
2023-11-20 17:41:48 +01:00
Silvano Cerza
83c245db74
feat: Implement function to convert legacy filters to new style ( #6314 )
...
* Implement function to convert legacy filters to new style
* Reduce return statements in conversion to fix linting
* Move convert function in different module
* Fix typos in docstrings
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
---------
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2023-11-20 13:00:05 +01:00
Agnieszka Marzec
497299c27a
Docs: Update Rankers docstrings and messages ( #6296 )
...
* Update docstrings and messages
* Fix tests
* Fix formatting
* Update haystack/preview/components/rankers/meta_field.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Fix tests
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-11-20 12:24:01 +01:00
Julian Risch
4ef2a680bb
feat: Add DocumentJoiner component 2.0 ( #6105 )
...
* draft DocumentJoiner
* implement merge and rrf
* draft end-to-end test with DocumentJoiner in hybrid doc search pipeline
* adjust for variadics Canals PR #122
* fix text_embedder input
* adapt to the new Document class
* adapt to new doc id
* specify documents input as Variadic in run method
* compare doc ids instead of full docs
* rename text_file_converter input to sources
* update docstring
* Update haystack/preview/components/routers/document_joiner.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from docstring review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* capitalize Documents and Retrievers in docstrings
* fix log message in test
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2023-11-20 10:56:56 +01:00
ZanSara
e905066458
feat: make InMemoryDocumentStore
return the number of docs actually written ( #6274 )
...
* make InMemoryDocumentStore return the number of documents actually written
* add fixme
* reno
* add missing continue
2023-11-20 10:03:22 +01:00
ZanSara
e888852aec
Standardize TextFileToDocument
( #6232 )
...
* simplify textfiletodocument
* fix error handling and tests
* stray print
* reno
* streams->sources
* reno
* feedback
* test
* fix tests
2023-11-17 15:39:39 +01:00
Silvano Cerza
c26a932423
Change preview tests to run all tests except integration ones ( #6325 )
2023-11-17 15:33:43 +01:00
ZanSara
dfc1d452bb
feat: upgrade canals to 0.10.1 ( #6309 )
...
* upgrade canals
* reno
* trigger preview e2e
* bump canals
* fix decorator
* fix test
* test factory
* tests inmemory
* tests writer
* test audio
* tests builders
* tests caching
* tests embedders
* tests converters
* tests generators
* tests rankers
* tests retrievers
* fix pipeline and telemetry tests
* remove trigger
2023-11-17 14:46:23 +01:00
Silvano Cerza
6dda6e5b2d
Change Document.__eq__ to compare all fields ( #6323 )
2023-11-16 17:17:43 +01:00
Massimiliano Pippi
ff3165b8b8
fix: fix un-flattening of metadata ( #6318 )
...
* fix un-flattening of metadata
* test should pass
* add relnote
* change policy: raise an error if both meta and keys are passed
* Update document.py
* support python 3.8
* adjust wording in the error message
2023-11-16 17:10:53 +01:00
x110
c4cfe6cb90
fix: Load additional fields from SQUAD-format file to meta field for labels #5978 ( #6301 )
...
* Load additional fields from SQUAD-format file to meta field for labels
* added a test function
* rewritten test using pytest
* added release notes
* improve release note
* clean up test
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-16 10:44:51 +01:00
Vivek Silimkhan
f998bf4a4f
feat: add Amazon Bedrock support ( #6226 )
...
* Add Bedrock
* Update supported models for Bedrock
* Fix supports and add extract response in Bedrock
* fix errors imports
* improve and refactor supports
* fix install
* fix mypy
* fix pylint
* fix existing tests
* Added Anthropic Bedrock
* fix tests
* fix sagemaker tests
* add default prompt handler, constructor and supports tests
* more tests
* invoke refactoring
* refactor model_kwargs
* fix mypy
* lstrip responses
* Add streaming support
* bump boto3 version
* add class docstrings, better exception names
* fix layer name
* add tests for anthropic and cohere model adapters
* update cohere params
* update ai21 args and add tests
* support cohere command light model
* add tital tests
* better class names
* support meta llama 2 model
* fix streaming support
* more future-proof model adapter selection
* fix import
* fix mypy
* fix pylint for preview
* add tests for streaming
* add release notes
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* fix format
* fix tests after msg changes
* fix streaming for cohere
---------
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
Co-authored-by: tstadel <thomas.stadelmann@deepset.ai>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2023-11-15 13:26:29 +01:00
Julian Risch
08ec492039
refactor!: Remove routing from DocumentLanguageClassifier and rename TextLanguageClassifier ( #6307 )
...
* remove routing from DocumentLanguageClassifier
* fix MetadataRouter typo
2023-11-15 13:10:07 +01:00
Ashwin Mathur
4e4d5eb3e2
feat!: Remove unused query parameter from MetaFieldRanker
( #6300 )
...
* Remove unused query parameter from MetaFieldRanker
* Add release notes
2023-11-14 12:33:38 +01:00
Stefano Fiorucci
f708cf6056
refactor!: set scale_score
default value to False ( #6276 )
...
* set default scale_score to False
* release note
2023-11-13 11:59:18 +01:00
Silvano Cerza
8e7ce208fc
Fix Document init when passing non existing fields ( #6286 )
...
* Fix Document init when passing non existing fields
* Update releasenotes/notes/fix-document-init-09c1cbb14202be7d.yaml
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* Fix linting
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-11-13 11:42:42 +01:00
Vladimir Blagojevic
b4d8d1c904
feat: Add custom conversion callable to PyPDFToDocument - Haystack 2.x ( #6258 )
...
* Allow user specified converter hook
* Add a release note
* More unit tests
* PR review - Massi, use protocol as converter
2023-11-09 17:35:33 +01:00
Agnieszka Marzec
1046bebbe0
Docs: Update docstrings lg ( #6260 )
...
* Update docstrings lg
* Update test_in_memory_bm25_retriever.py
* Update test_in_memory_embedding_retriever.py
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-11-09 17:34:52 +01:00
Stefano Fiorucci
f95937b0ce
chore: move HuggingFaceLocalGenerator
to the generators
directory ( #6264 )
...
* move HuggingFaceLocalGenerator to right directory
* fix tests
2023-11-09 15:59:23 +01:00
Stefano Fiorucci
2b3c77e41d
fix: make JoinDocuments
correctly handle duplicate documents w null scores ( #6261 )
...
* fix error with null values
* release note
* simplify
2023-11-09 14:28:56 +01:00
Domenico
676da681d0
feat: MetaField Ranker ( #6189 )
...
* proposal: meta field ranker
* Apply suggestions from code review
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
* update proposal filename
* feat: add metafield ranker
* fix docstrings
* remove proposal file from pr
* add release notes
* update code according to new Document class
* separate loops for each ranking mode in __merge_scores
* change error type in init and new tests for linear score warning
* docstring upd
---------
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-11-09 12:20:41 +01:00
Sebastian Husch Lee
71d0d92ea2
feat: Add model_kwargs
to ExtractiveReader to impact model loading ( #6257 )
...
* Add ability to pass model_kwargs to AutoModelForQuestionAnswering
* Add testing for new model_kwargs
* Add spacing
* Add release notes
* Update haystack/preview/components/readers/extractive.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* Make changes suggested by Stefano
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-09 11:25:22 +01:00
Vladimir Blagojevic
cd429a73cd
feat: Add GPTChatGenerator
to Haystack 2.x ( #6212 )
...
* Add GPTChatGenerator
* Apply lessons from previous PR
* PR review - Stefano
2023-11-09 10:45:41 +01:00
Silvano Cerza
bf884094d1
refactor: Change Document.blob type and remove mime_type field ( #6249 )
...
* Change Document.blob type and remove mime_type field
* Add release notes
* Remove mime_type from Document docstring
2023-11-08 10:35:17 +01:00
Vladimir Blagojevic
5497ca2a45
feat: Adapt GPTGenerator
to use str input/output format in Haystack 2.x ( #6214 )
...
* Adapt GPTGenerator to string input/output
* Finishing touches
* punctuation upd
* PR feedback
* Small naming fixes
* Update haystack/preview/components/generators/openai.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* Update class pydoc with a printed response
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-11-07 18:00:43 +01:00
Stefano Fiorucci
fb96aef4dd
refactor!: move classifiers to an appropriate directory/package ( #6240 )
...
* mv classifiers
* release note
2023-11-06 12:00:01 +01:00
Vladimir Blagojevic
d7e1833c40
feat: Add HuggingFaceTGIChatGenerator
Haystack 2.x component ( #6199 )
...
* Add ChatHuggingFaceTGIGenerator
* Add release note
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-06 09:48:45 +01:00
Stefano Fiorucci
063d27c522
refactor!: rename TextDocumentSplitter
to DocumentSplitter
( #6223 )
...
* rename TextDocumentSplitter to DocumentSplitter
* reno
* fix init
2023-11-03 11:33:20 +01:00
Vladimir Blagojevic
6e2dbdc320
feat: Add HuggingFaceTGIGenerator
Haystack 2.x component ( #6205 )
...
* Add HuggingFaceTGIGenerator
* PR review
* PR feedback from Stefano
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-02 19:35:16 +01:00
Stefano Fiorucci
8511b8cd79
feat: HuggingFaceLocalGenerator
- allow passing generation_kwargs
in run
method ( #6220 )
...
* allow custom generation_kwargs in run
* reno
* make pylint ignore too-many-public-methods
2023-11-02 15:29:38 +01:00
Ashwin Mathur
6bf0b9dc7c
feat: Add MarkdownToTextDocument
(v2) ( #6159 )
...
* Add MarkdownToTextDocument
* Add release notes
* Update GitHub workflows
* Update GitHub workflows
* Refactor code with minimal dependencies
* Update docstrings
* Apply suggestions from code review
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update document with content and meta for backward compatibility
* Refactor Document Class for Backward Compatibility
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* Update tests
* Improve test assertions
---------
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-10-31 18:28:13 +01:00
Julian Risch
29b1fefaa4
feat: Add DocumentLanguageClassifier 2.0 ( #6037 )
...
* add DocumentLanguageClassifier and tests
* reno
* fix import, rename DocumentCleaner
* mark example usage as python code
* add assertions to e2e test
* use deserialized document_store
* Apply suggestions from code review
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* remove from/to_dict
* use renamed InMemoryDocumentStore
* adapt to Document refactoring
* improve docstring
* fix test for new Document
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2023-10-31 15:35:05 +01:00
Silvano Cerza
7287657f0e
refactor: Rename Document
's text
field to content
( #6181 )
...
* Rework Document serialisation
Make Document backward compatible
Fix InMemoryDocumentStore filters
Fix InMemoryDocumentStore.bm25_retrieval
Add release notes
Fix pylint failures
Enhance Document kwargs handling and docstrings
Rename Document's text field to content
Fix e2e tests
Fix SimilarityRanker tests
Fix typo in release notes
Rename Document's metadata field to meta (#6183 )
* fix bugs
* make linters happy
* fix
* more fix
* match regex
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-10-31 12:44:04 +01:00
Vladimir Blagojevic
c51aa1ee8d
feat: Add general and HF util methods ( #6200 )
...
* Add general and hf util methods
2023-10-31 11:13:11 +01:00
Silvano Cerza
76d5142bb8
Refactor: Document serialization and backward compatibility ( #6180 )
...
* Rework Document serialisation
* Make Document backward compatible
* Fix InMemoryDocumentStore filters
* Fix InMemoryDocumentStore.bm25_retrieval
* Add release notes
* Fix pylint failures
* Enhance Document kwargs handling and docstrings
* cosmetics
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-10-30 17:03:06 +01:00
Massimiliano Pippi
789e524de3
remove leftovers from 1.18 ( #6196 )
2023-10-30 11:25:54 +01:00
Vladimir Blagojevic
f76fc04ed0
feat: Add StreamingChunk
dataclass to Haystack 2.x ( #6174 )
...
* Add StreamingChunk
* Add release note
* Use default value init for metadata, turn of hashing
* Add unit tests
2023-10-26 17:42:52 +02:00
Vladimir Blagojevic
bb295d29ee
Fix failing test ( #6176 )
2023-10-26 17:22:24 +02:00
Ashwin Mathur
5f35e7d04a
refactor: Migrate RemoteWhisperTranscriber
to OpenAI SDK. ( #6149 )
...
* Migrate RemoteWhisperTranscriber to OpenAI SDK
* Migrate RemoteWhisperTranscriber to OpenAI SDK
* Remove unnecessary imports
* Add release notes
* Fix api_key serialization
* Fix linting
* Apply suggestions from code review
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
* Add additional tests for api_key
* Adapt .run() to take ByteStream inputs
* Update docstrings
* Rework implementation to use io.BytesIO
* Update error message
* Add default file name
---------
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-10-26 16:25:23 +02:00
Stefano Fiorucci
1f4ed3cc03
refactor!: rename SimilarityRanker
to TransformersSimilarityRanker
( #6100 )
...
* rename
* release note
* Update haystack/preview/components/rankers/transformers_similarity.py
Co-authored-by: Domenico <domenico.cinque98@gmail.com>
* Update haystack/preview/components/rankers/transformers_similarity.py
Co-authored-by: Domenico <domenico.cinque98@gmail.com>
* fix test
---------
Co-authored-by: Domenico <domenico.cinque98@gmail.com>
2023-10-24 19:45:16 +02:00
Grant Williams
1cf70d3dce
build: Upgrade transformers to the latest version 4.34.1 ( #5994 )
...
* Upgrade transformers to the latest version 4.34.0 so that Haystack can support the new Mistral, Nougat, and other models.
* update release notes
* updated missing lazy import
* Update .github workflows imports
* bump more versions in .github workflows
* rever import sorting
* Update to catch runtime errors to match haystack_hub changes
* add language parameter value to whisper test
* bump transformers version in linting preview workflow
* bump transformers version in linting preview workflow
* bump version to v4.34.1
* resolve mypy issue with reused variables
* install openai-whisper without dependencies
* remove audio extra, update whisper install instructions
* remove audio extra, update whisper install instructions
* keep audio extra but add version
* keep audio extra with no constraints
* remove audio extra
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-10-24 19:13:12 +02:00
Vladimir Blagojevic
b9b7d7666d
feat: Add dynamic per-user ChatMessage templating support ( #6161 )
...
* Add dynamic per-user ChatMessage templating support
* Add unit tests for dynamic templating
* Update add-dynamic-per-message-templating-908468226c5e3d45.yaml
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Proper init ValueError raising, unit tests
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-10-24 16:50:45 +02:00
Massimiliano Pippi
dd24210908
feat: add pipeline Yaml marshaller ( #6137 )
...
* add marshaller
* release notes
* add docstrings and missing tests
2023-10-23 19:02:59 +02:00
Silvano Cerza
31fb5b84e7
feature: Add mime_type
field to ByteStream
( #6154 )
...
* Add mime_type field to ByteStream
* Add release notes
* Update tests
2023-10-23 16:13:40 +02:00
Vladimir Blagojevic
dcc7e63dc9
feat: Add ChatMessage class to Haystack 2.0 ( #6144 )
...
* Add ChatMessage and ChatRole
2023-10-23 16:08:05 +02:00
Silvano Cerza
ae812617fd
Remove Document.array field ( #6139 )
2023-10-23 13:01:15 +02:00
Stefano Fiorucci
047e79f256
refactor: better API keys handling in GPTGenerator
( #6103 )
...
* refactor: do not serialize API keys
* release note
* check if api key is set in the module client
* make tests more robust
* better tests
2023-10-23 12:53:52 +02:00