bogdankostic
728383a149
fix: Make TransformersSimilarityRanker
run with single document list ( #6503 )
...
* Make `TransformersSimilarityRanker` run with single document list
* Add release note
* Remove unused import in test
2023-12-08 16:18:46 +01:00
Ashwin Mathur
2767cd2f01
Fix usage examples ( #6507 )
2023-12-07 14:01:32 +01:00
Bijay Gurung
c5342d1110
fix: Prevent invalid answer from being selected in ExtractiveReader ( #6460 )
...
* Fix invalid answer being selected issue on ExtractiveReader
* Rename variables to not shadow arguments
2023-12-06 09:49:02 +01:00
Vladimir Blagojevic
008a322023
feat: Add Indexing Pipeline ( #6424 )
...
* Add build_indexing_pipeline utils function
* Pylint fixes
* Move into another package to avoid circular deps
* Revert change
* Revert haystack/utils/__init__.py change
* Add example
* Use DocumentStore type, remove typing checks
2023-12-04 16:08:53 +01:00
ZanSara
a38f871dbd
feat: Add RAG pipeline ( #6461 )
...
* add rag pipeline
* Update examples/getting_started/rag.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-12-04 15:25:29 +01:00
Stefano Fiorucci
4912f7cb58
refactor!: improve the deserialization logic for components that use a Document Store ( #6466 )
...
* improve deserialization
* rm ds decorator
* improve tests
* fix pylint
* rm decorator from module init
* rm decorator
* rm decorator from factory
* fix tests
* release note
* rm print
2023-12-04 15:17:28 +01:00
Vladimir Blagojevic
b9bf83bbef
feat: Allow flat dictionary Pipeline.run()
inputs ( #6413 )
...
* Initial implementation, release note, update API and unit test
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-30 14:37:55 +01:00
Silvano Cerza
831d0611d9
feat: Change default DuplicatePolicy
in DocumentStore.write_documents()
( #6438 )
...
* Change default DuplicatePolicy in DocumentStore.write_documents()
* Add release notes
2023-11-28 12:30:17 +01:00
Massimiliano Pippi
00e1dd6eb8
chore: rearrange the core
package, move tests and clean up ( #6427 )
...
* rearrange code
* fix tests
* relnote
* merge test modules
* remove extra
* rearrange draw tests
* forgot
* remove unused import
2023-11-28 09:58:56 +01:00
Silvano Cerza
9a7fd6f2ce
refactor: Add new filters tests for Document Store testing ( #6428 )
...
* Add new filters tests for Document Store testing
* Add release notes
2023-11-28 09:57:08 +01:00
Silvano Cerza
fd16ec63cb
refactor: Add support for new filters declaration ( #6397 )
...
* Rework filter logic for InMemoryDocumentStore to support new filters
declaration
* Fix legacy filters tests
* Simplify logic and handle dates comparison
* Rework MetadataRouter to support new filters
* Update docstrings
* Add release notes
* Fix linting
* Avoid duplicating filters specifications
* Handle corner case
* Simplify docstring
* Fix filters logic and tests
* Fix Document Store testing legacy filters tests
2023-11-24 11:22:46 +01:00
SebastjanPrachovskij
28c2b09d90
Add SearchApi integration for websearch ( #6400 )
2023-11-24 11:18:43 +01:00
pandasar13
edb40b6c1b
refactor: add batch_size to FAISS __init__ ( #6401 )
...
* refactor: add batch_size to FAISS __init__
* refactor: add batch_size to FAISS __init__
* add release note to refactor: add batch_size to FAISS __init__
* fix release note
* add batch_size to docstrings
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2023-11-23 17:27:24 +01:00
ZanSara
4ec6a60a76
feat: CohereGenerator
( #6395 )
...
* added CohereGenerator with unit tests
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. added releasenote
2. removed commented files in test-cohere_generators
3. removed unused imports
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. move client creation to __init__
2. remove dict casting of metadata in run
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* few fixes
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* add cohere to git workflows
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. CohereGenerator as top level import in generators
2. small change in doc string
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. corrected git workflow files for cohere import
2. changed api key env var from CO_API_KEY to COHERE_API_KEY
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* added cohere in missed out workflow installs
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. Removed default_streaming_callback from cohere.py and added in test.
2. Added kwargs doc strings for CohereGenerator
3. removed type hints for metadata and replies
4. use COHERE_API_URL instead of hard coded URL.
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* move out of folder
* black
* fix tests
* feedback
* black
* remove api key from tests
* read api key from env var if missing
* typo
* black
* missing import
---------
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
Co-authored-by: sunilkumardash9 <sunilkumardash9@gmail.com>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-11-23 17:21:07 +01:00
jlonge4
c44e2cf49b
feat: add microsoft pptx file converter ( #6399 )
...
* Create pptx.py
* feat: pptx converter import __init__.py
* feat: add pptx import __init__.py
* feat: add python-pptx dependency
* feat: add sample pptx for testing
* feat: add pptx file-converter test
* feat: release note pptx-file-converter-3e494d2747637eb2.yaml
* feat: Update releasenotes/notes/pptx-file-converter-3e494d2747637eb2.yaml
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* feat: refactor haystack/nodes/file_converter/pptx.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* fix imports
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-23 16:46:41 +01:00
Stefano Fiorucci
b0b514778d
fix!: make PyPDFToDocument
JSON-serializable ( #6396 )
...
* add registry
* release not
* add checks
* rm superflous check
* fix typo
* rm print :-)
2023-11-23 15:37:20 +01:00
Ben Heckmann
a492771b4d
feat: PreProcessor split by token (tiktoken & Hugging Face) ( #5276 )
...
* #4983 implemented split by token for tiktoken tokenizer
* #4983 added unit test for tiktoken splitting
* #4983 implemented and added a test for splitting documents with HuggingFace tokenizer
* #4983 added support for passing HF model names (instead of objects) and added an example to the HF token splitting test
* mocked HTTP model loading in unit tests, fixed pylint error
* fix lossy tokenizers splitting, use LazyImport, ignore UnicodeEncodeError for tiktoken
* reno
* rename reno file
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-11-23 12:26:37 +01:00
Vladimir Blagojevic
e04a1f16bb
feat: Add DynamicPromptBuilder to Haystack 2.x ( #6328 )
...
* Add DynamicPromptBuilder
* Improve pydocs, add unit tests
* Add release note
* Make expected_runtime_variables optional
* Add pydocs usage example
* Add more pydocs
* Remove test markers
* Update type in unit test
* Update after canals upgrade
* add to api ref
* docstrings updates
* Update test/preview/components/builders/test_dynamic_prompt_builder.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/preview/components/builders/dynamic_prompt_builder.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Deparametrize init test
* Rename expected_runtime_variables to runtime_variables
* Rephrase docstring so meaning is clearer
---------
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-23 11:41:57 +01:00
Vladimir Blagojevic
b557f3035e
feat: Add ConditionalRouter
Haystack 2.x component ( #6147 )
...
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-11-23 10:28:08 +01:00
Stefano Fiorucci
e91f7a8a4d
refactor!: improve the public interface of Generators ( #6374 )
...
* merge lazy import blocks
* refactor generators
* release note
* revert unrelated changes
2023-11-22 10:40:48 +01:00
ZanSara
b751978d65
Extends input types of RemoteWhisperTranscriber
( #6218 )
...
* fix tests
* reno
* tests
* retain file name
* paths are strings for openai sdk
* streams->sources
* feedback
* always add name to file
* mypy
* test placeholder with extension
* fallback
* paths
* path test
* path must be a string
* fix test
2023-11-22 09:57:45 +01:00
Ashwin Mathur
e6c8374562
feat: Add ByteStream
metadata and other metadata to Documents
created by HTMLToDocument
( #6304 )
...
* Refactor HTMLToDocument
* Add release notes
* Add additional tests
* remove progress bar
* Add additional test for metadata
* remove progress bar from release notes
* Update tests
* Use truthiness checks instead of is not None
2023-11-21 21:44:02 +01:00
Daniel Fleischer
0cef17ac13
feat: embedding instructions for dense retrieval ( #6372 )
...
* Embedding instructions in EmbeddingRetriever
Query and documents embeddings are prefixed with instructions, useful
for retrievers finetuned on specific tasks, such as Q&A.
* Tests
Checking vectors 0th component vs. reference, using different stores.
* Normalizing vectors
* Release notes
2023-11-21 12:56:40 +01:00
Silvano Cerza
83c245db74
feat: Implement function to convert legacy filters to new style ( #6314 )
...
* Implement function to convert legacy filters to new style
* Reduce return statements in conversion to fix linting
* Move convert function in different module
* Fix typos in docstrings
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
---------
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2023-11-20 13:00:05 +01:00
ZanSara
9cee2f82c4
feat: extend write_documents
to return the number of documents actually written in the document store ( #6006 )
...
* add typing and docstring
* reno
* Update releasenotes/notes/extend-write-documents-855ffc315974f03b.yaml
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-11-20 11:54:02 +01:00
Julian Risch
4ef2a680bb
feat: Add DocumentJoiner component 2.0 ( #6105 )
...
* draft DocumentJoiner
* implement merge and rrf
* draft end-to-end test with DocumentJoiner in hybrid doc search pipeline
* adjust for variadics Canals PR #122
* fix text_embedder input
* adapt to the new Document class
* adapt to new doc id
* specify documents input as Variadic in run method
* compare doc ids instead of full docs
* rename text_file_converter input to sources
* update docstring
* Update haystack/preview/components/routers/document_joiner.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from docstring review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* capitalize Documents and Retrievers in docstrings
* fix log message in test
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2023-11-20 10:56:56 +01:00
ZanSara
e905066458
feat: make InMemoryDocumentStore
return the number of docs actually written ( #6274 )
...
* make InMemoryDocumentStore return the number of documents actually written
* add fixme
* reno
* add missing continue
2023-11-20 10:03:22 +01:00
x110
d03bffab8b
Promptnode timeout ( #6282 )
2023-11-19 16:32:09 +01:00
Stefano Fiorucci
68be0d7f2c
refactor: improve Document representation ( #6333 )
...
* new repr
* reno
2023-11-17 17:49:00 +01:00
ZanSara
e888852aec
Standardize TextFileToDocument
( #6232 )
...
* simplify textfiletodocument
* fix error handling and tests
* stray print
* reno
* streams->sources
* reno
* feedback
* test
* fix tests
2023-11-17 15:39:39 +01:00
ZanSara
dfc1d452bb
feat: upgrade canals to 0.10.1 ( #6309 )
...
* upgrade canals
* reno
* trigger preview e2e
* bump canals
* fix decorator
* fix test
* test factory
* tests inmemory
* tests writer
* test audio
* tests builders
* tests caching
* tests embedders
* tests converters
* tests generators
* tests rankers
* tests retrievers
* fix pipeline and telemetry tests
* remove trigger
2023-11-17 14:46:23 +01:00
Stefano Fiorucci
dd6e35d675
build: upgrade to transformers==4.35.2
( #6322 )
...
* upgrade transformers to 4.35.2
* reno
2023-11-17 10:12:34 +01:00
Silvano Cerza
6dda6e5b2d
Change Document.__eq__ to compare all fields ( #6323 )
2023-11-16 17:17:43 +01:00
Massimiliano Pippi
ff3165b8b8
fix: fix un-flattening of metadata ( #6318 )
...
* fix un-flattening of metadata
* test should pass
* add relnote
* change policy: raise an error if both meta and keys are passed
* Update document.py
* support python 3.8
* adjust wording in the error message
2023-11-16 17:10:53 +01:00
Julian Risch
34ecff1d19
build: Upgrade openai-whisper and re-introduce audio extra ( #6319 )
...
* upgrade openai-whisper and re-introduce audio extra
* add audio extra to
2023-11-16 15:04:50 +01:00
Julian Risch
8b092a90c0
test: Add MetadataRouter to preprocessing pipeline in e2e test ( #6321 )
...
* add MetadataRouter to preprocessing pipeline
* replace mimetype check with language check
2023-11-16 11:22:37 +01:00
x110
c4cfe6cb90
fix: Load additional fields from SQUAD-format file to meta field for labels #5978 ( #6301 )
...
* Load additional fields from SQUAD-format file to meta field for labels
* added a test function
* rewritten test using pytest
* added release notes
* improve release note
* clean up test
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-16 10:44:51 +01:00
Vivek Silimkhan
f998bf4a4f
feat: add Amazon Bedrock support ( #6226 )
...
* Add Bedrock
* Update supported models for Bedrock
* Fix supports and add extract response in Bedrock
* fix errors imports
* improve and refactor supports
* fix install
* fix mypy
* fix pylint
* fix existing tests
* Added Anthropic Bedrock
* fix tests
* fix sagemaker tests
* add default prompt handler, constructor and supports tests
* more tests
* invoke refactoring
* refactor model_kwargs
* fix mypy
* lstrip responses
* Add streaming support
* bump boto3 version
* add class docstrings, better exception names
* fix layer name
* add tests for anthropic and cohere model adapters
* update cohere params
* update ai21 args and add tests
* support cohere command light model
* add tital tests
* better class names
* support meta llama 2 model
* fix streaming support
* more future-proof model adapter selection
* fix import
* fix mypy
* fix pylint for preview
* add tests for streaming
* add release notes
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* fix format
* fix tests after msg changes
* fix streaming for cohere
---------
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
Co-authored-by: tstadel <thomas.stadelmann@deepset.ai>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2023-11-15 13:26:29 +01:00
Julian Risch
08ec492039
refactor!: Remove routing from DocumentLanguageClassifier and rename TextLanguageClassifier ( #6307 )
...
* remove routing from DocumentLanguageClassifier
* fix MetadataRouter typo
2023-11-15 13:10:07 +01:00
Julian Risch
5295b40def
docs: Reader returns top_k+1 answers if no_answer is enabled
2023-11-15 10:20:21 +01:00
Ashwin Mathur
4e4d5eb3e2
feat!: Remove unused query parameter from MetaFieldRanker
( #6300 )
...
* Remove unused query parameter from MetaFieldRanker
* Add release notes
2023-11-14 12:33:38 +01:00
Stefano Fiorucci
f708cf6056
refactor!: set scale_score
default value to False ( #6276 )
...
* set default scale_score to False
* release note
2023-11-13 11:59:18 +01:00
Silvano Cerza
8e7ce208fc
Fix Document init when passing non existing fields ( #6286 )
...
* Fix Document init when passing non existing fields
* Update releasenotes/notes/fix-document-init-09c1cbb14202be7d.yaml
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* Fix linting
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-11-13 11:42:42 +01:00
Vladimir Blagojevic
b4d8d1c904
feat: Add custom conversion callable to PyPDFToDocument - Haystack 2.x ( #6258 )
...
* Allow user specified converter hook
* Add a release note
* More unit tests
* PR review - Massi, use protocol as converter
2023-11-09 17:35:33 +01:00
Stefano Fiorucci
2b3c77e41d
fix: make JoinDocuments
correctly handle duplicate documents w null scores ( #6261 )
...
* fix error with null values
* release note
* simplify
2023-11-09 14:28:56 +01:00
Domenico
676da681d0
feat: MetaField Ranker ( #6189 )
...
* proposal: meta field ranker
* Apply suggestions from code review
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
* update proposal filename
* feat: add metafield ranker
* fix docstrings
* remove proposal file from pr
* add release notes
* update code according to new Document class
* separate loops for each ranking mode in __merge_scores
* change error type in init and new tests for linear score warning
* docstring upd
---------
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-11-09 12:20:41 +01:00
Sebastian Husch Lee
71d0d92ea2
feat: Add model_kwargs
to ExtractiveReader to impact model loading ( #6257 )
...
* Add ability to pass model_kwargs to AutoModelForQuestionAnswering
* Add testing for new model_kwargs
* Add spacing
* Add release notes
* Update haystack/preview/components/readers/extractive.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* Make changes suggested by Stefano
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-09 11:25:22 +01:00
Vladimir Blagojevic
cd429a73cd
feat: Add GPTChatGenerator
to Haystack 2.x ( #6212 )
...
* Add GPTChatGenerator
* Apply lessons from previous PR
* PR review - Stefano
2023-11-09 10:45:41 +01:00
jambudipa
2f118e857c
feat: add tokenization details for gpt-4-1106-preview ( #6250 )
...
* feat: add tokenization details for gpt-4-1106-preview
* update max_tokens value
* reno
---------
Co-authored-by: jambudipa <mark.norgate@ext.ons.gov.uk>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-11-08 12:04:08 +01:00
Silvano Cerza
bf884094d1
refactor: Change Document.blob type and remove mime_type field ( #6249 )
...
* Change Document.blob type and remove mime_type field
* Add release notes
* Remove mime_type from Document docstring
2023-11-08 10:35:17 +01:00