Massimiliano Pippi
714b944dc2
chore: rename store
to document_store
for clarity ( #5547 )
...
* store -> document_store
* fix leftovers
* fix import name
* moar leftovers
* rebase on main, update MemoryDocumentStore to the new protocol
* Update haystack/preview/pipeline.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-12 08:44:36 +02:00
Silvano Cerza
a7416bcf89
Add to_dict
and from_dict
methods for Stores ( #5541 )
...
* Add to_dict and from_dict methods for Stores
* Add release notes
* Add tests with custom init parameters
2023-08-11 14:45:56 +02:00
Silvano Cerza
168b7c806c
Add _store_name field to StoreAwareMixin to ease serialisation ( #5531 )
2023-08-10 15:42:19 +02:00
Vladimir Blagojevic
a75b9dd4bb
feat: LinkContentFetcher - add content-type resolution, user agent switching, PDF handler ( #5374 )
...
* Add content type resolution, pdf handler, user agent switching
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-09 18:14:04 +02:00
ZanSara
5ca4874df9
Migrate existing v2 components to Canals 0.4.0 ( #5532 )
...
* pin canals==0.4.0
* update audio components
* allow audio components to receive whisper_params in init too
* migrating memoryretriever
* migrate memoryretriever
* migrate TextFileToDocument
* fix TextFileToDocument tests
* fix pipeline tests
* fix defaults management
* reno
* inverted assignments
* Simplify release notes
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-08-09 15:51:32 +02:00
Silvano Cerza
83fce1bd72
Add Store class factory ( #5530 )
...
* Add Store class factory
* Add release notes
2023-08-09 13:09:36 +02:00
Vladimir Blagojevic
227bf6ca39
feat: Remove template variables from PromptNode invocation kwargs ( #5526 )
...
* Remove template params from kwargs before passing kwargs to invocation layer
* More unit tests
* Add release note
* Enable simple prompt node pipeline integration test use case
2023-08-08 16:40:23 +02:00
Vladimir Blagojevic
84ed954c8c
feat: Improve performance and add default media support in FileTypeClassifier ( #5083 )
...
* feat: add media outgoing edge to FileTypeClassifier
* Add release note
* Update language
---------
Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-08 15:51:07 +02:00
tstadel
d46c84bb61
feat: support dynamic filters in custom_query ( #5427 )
...
* support filters in custom_query
* better tests
* Update docstrings
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-08 15:48:15 +02:00
Stefano Fiorucci
3f472995bb
refactor: update Crawler to support selenium>=4.11.0 and simplify it ( #5515 )
...
* refactor crawler
* rm unused imports
* release notes!
* rm outdated mock
2023-08-08 15:13:22 +02:00
Fanli Lin
f6b50cfdf9
fix: StopWordsCriteria doesn't compare the stop word token ids with the input ids in a continuous and sequential order ( #5503 )
...
* bug fix
* add release note
* add unit test
* refactor
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-08-08 08:35:10 +02:00
Massimiliano Pippi
ac4e762422
Fix datadog client init ( #5524 )
2023-08-07 12:18:46 +02:00
Massimiliano Pippi
c079576a87
chore: move base test class into haystack core ( #5509 )
...
* move base test class into haystack core
* fix linter
* do not compute coverage of testing code
2023-08-04 12:42:13 +02:00
Vladimir Blagojevic
d96c963bc4
test: Convert two HFLocalInvocationLayer integration to unit tests ( #5446 )
...
* Convert two HFLocalInvocationLayer integration to unit tests
* Simplify unit test
* Improve HFLocalInvocationLayer unit tests
2023-08-03 17:41:32 +02:00
bogdankostic
56cea8cbbd
test: Add scripts to send benchmark results to datadog ( #5432 )
...
* Add config files
* log benchmarks to stdout
* Add top-k and batch size to configs
* Add batch size to configs
* fix: don't download files if they already exist
* Add batch size to configs
* refine script
* Remove configs using 1m docs
* update run script
* update run script
* update run script
* datadog integration
* remove out folder
* gitignore benchmarks output
* test: send benchmarks to datadog
* remove uncommented lines in script
* feat: take branch/tag argument for benchmark setup script
* fix: run.sh should ignore errors
* Remove changes unrelated to datadog
* Apply black
* Update test/benchmarks/utils.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* PR feedback
* Account for reader benchmarks not doing indexing
* Change key of reader metrics
* Apply PR feedback
* Remove whitespace
---------
Co-authored-by: rjanjua <rohan.janjua@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-03 10:09:00 +02:00
Vladimir Blagojevic
1876c41f07
feat: Add LostInTheMiddleRanker ( #5457 )
...
* Add lost in the middle ranker
* Add release note
* Julian's feedback: more precise version of truncate
* Better comments for the litm algorithm
* Sebastian PR feedback
* Add check for invalid values of word_count_threshold
* Remove _truncate as it is not needed any more
---------
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-08-02 17:05:13 +02:00
Vladimir Blagojevic
0efe0ee7b3
feat: Add top_k
parameter to DiversityRanker
init method ( #5494 )
...
* Add top_k
* Add release note
2023-08-02 17:04:04 +02:00
Fanli Lin
8d04f28e11
fix: hf agent outputs the prompt text while the openai agent not ( #5461 )
...
* add skil prompt
* fix formatting
* add release note
* add release note
* Update releasenotes/notes/add-skip-prompt-for-hf-model-agent-89aef2838edb907c.yaml
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/nodes/prompt/invocation_layer/handlers.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/prompt/invocation_layer/handlers.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/prompt/invocation_layer/hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* add a unit test
* add a unit test2
* add skil prompt
* Revert "add skil prompt"
This reverts commit b1ba938c94b67a4fd636d321945990aabd2c5b2a.
* add unit test
---------
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-08-02 16:34:33 +02:00
Fanli Lin
73fa796735
fix: enable passing max_length
for text2text-generation task ( #5420 )
...
* bug fix
* add unit test
* reformatting
* add release note
* add release note
* Update releasenotes/notes/enable-set-max-length-during-runtime-097d65e537bf800b.yaml
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update test/prompt/invocation_layer/test_hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update test/prompt/invocation_layer/test_hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update test/prompt/invocation_layer/test_hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update test/prompt/invocation_layer/test_hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* bug fix
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-08-02 14:13:30 +02:00
Vladimir Blagojevic
40a2e9b56a
refactor: Update WebRetriever to use LinkContentFetcher ( #5229 )
...
* Refactor WebRetriever to use LinkContentFetcher
* PR feedback
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-08-02 12:45:03 +02:00
Fanli Lin
f7fd5eeb4f
feat: enable loading tokenizer for models that are not supported by the transformers library ( #5314 )
...
* add tokenizer load
* change import order
* move imports
* refactor code
* import lib
* remove pretrainedmodel
* fix linting
* update patch
* fix order
* remove tokenizer class
* use tokenizer class
* no copy
* add case for model is an instance
* fix optional
* add ut
* set default to None
* change models
* Update haystack/nodes/prompt/invocation_layer/hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/prompt/invocation_layer/hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* add unit tests
* add unit tests
* remove lib
* formatting
* formatting
* formatting
* add release note
* Update releasenotes/notes/load-tokenizer-if-not-load-by-transformers-5841cdc9ff69bcc2.yaml
Co-authored-by: bogdankostic <bogdankostic@web.de>
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-08-02 11:42:23 +02:00
Vladimir Blagojevic
540d0fad97
feat: Add DiversityRanker ( #5398 )
...
* Introduce DiversityRanker
* improve most_diverse_order speed
* Compute mean for numerical stability
* Add release note
* Add cosine similarity
* Test both dot product and cosine similarity
* Add pydocs hook
---------
Co-authored-by: Michel Bartels <login@michelbartels.com>
2023-08-01 12:48:34 +02:00
bogdankostic
a51ca19fe4
feat: Add TextFileToDocument
component (v2) ( #5467 )
...
* Add TextfileToDocument component
* Add docstrings
* Add unit tests
* Add release note file
* Make use of progress bar
* Add TextfileToDocument to __init__.py
* Use lazy % formatting in logging functions
* Remove f from non-f-string
* Add TextfileToDocument to __init__.py
* Use correct dependency extra
* Compare file path against path object
* PR feedback
* PR feedback
* Update haystack/preview/components/file_converters/txt.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update docstrings
* Add error handling
* Add unit test
* Reintroduce falsely removed caplog
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-08-01 11:34:52 +02:00
Stefano Fiorucci
6f534873a5
fix: restrict supports
method in the OpenAI invocation layer and a similar method in the EmbeddingRetriever
( #5458 )
...
* restrict OpenAI supports method
* better note
* Update releasenotes/notes/restrict-openai-supports-method-fb126583e4beb057.yaml
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-07-31 13:14:22 +02:00
Vladimir Blagojevic
409e3471cb
feat: Enable Support for Meta LLama-2 Models in Amazon Sagemaker ( #5437 )
...
* Enable Support for Meta LLama-2 Models in Amazon Sagemaker
* Improve unit test for invocation layers positioning
* Small adjustment, add more unit tests
* mypy fixes
* Improve unit tests
* Update test/prompt/invocation_layer/test_sagemaker_meta.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* PR feedback
* Add pydocs for newly extracted methods
* simplify is_proper_chat_*
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2023-07-26 15:26:39 +02:00
Silvano Cerza
7940ec0482
Add @store decorator ( #5438 )
2023-07-26 09:32:23 +02:00
Julian Risch
5bb0a1f57a
Revert "fix: num_return_sequences should be less than num_beams, not top_k ( #5280 )" ( #5434 )
...
This reverts commit 514f93a6eb575d376b21d22e32080fac62cf785f.
2023-07-25 13:27:41 +02:00
Sebastian Husch Lee
2bc7fe1a08
test: reactivate unit tests in test_eval.py
( #5255 )
...
* Activate tests that follow unit test and integration test rules
* Adding more integration labels
* Change name to better reflect complexity of test
* Remove mark integration tags, move test to doc store test for add_eval_data
* Removing incorrect integration label
* Deactivated document store test b/c it fails for Weaviate and pinecone
* Remove unit label since test needs to be refactored to be considered a unit test
* Undo changes
* Undo change
* Check every field in the load evaluation result
* Add back label and add skip reason
* Use pytest skip instead of TODO
2023-07-24 17:07:45 +02:00
Vladimir Blagojevic
597df1414c
feat: Update Anthropic Claude support with the latest models, new streaming API, context window sizes ( #5406 )
...
* Update Claude support with the latest models, new streaming API, context window sizes
* Use Github Anthropic SDK link for tokenizer, revert _init_tokenizer
* Change example key name to ANTHROPIC_API_KEY
2023-07-21 13:33:07 +02:00
elundaeva
612c6779fb
feat: RecentnessRanker ( #5301 )
...
* recency reranker code
* removed
* readd
* edited code
* edit
* mypy test fix
* adding warnings for score method
* fix
* fix
* adding paper link
* comments implementation
* change to predict and predict_batch
* change to predict and predict_batch 2
* adding unit test
* fixes
* small fixes
* fix for unit test
* table driven test
* small fixes
* small fixes2
* adding predict_batch tests
* add recentness_ranker to api reference docs
* implementing feedback
* implementing feedback2
* implementing feedback3
* implementing feedback4
* implementing feedback5
* remove document_map, remove final check if score is not None
* add final check if doc score is not None for mypy
---------
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-07-20 16:20:45 +02:00
Sebastian Husch Lee
f7642e83ea
feat: Add embed_meta_fields to Ranker nodes ( #5361 )
...
* Adding embed_meta_fields to ranker nodes
* Fix tests by adding case where embed_meta_fields=None
* Adding unit test for _add_meta_fields_to_docs
* Fix pylint
* Add unit test
* Added another unit test. Caught a bug.
* Adding more unit tests
* Add unit test
* Updating some older tests into unit tests using mocking
* Convert another test to unit test
* Test run method
* One last unit test
2023-07-18 09:11:51 +02:00
ZanSara
8f3fe85878
feat: extend pipeline.add_component
to support stores ( #5261 )
...
* add protocol and adapt pipeline
* change API in pipeline.add_component
* adapt pipeline tests
* adapt memoryretriever
* additional checks
* separate protocol and mixin
* review feedback & update tests
* pylint
* Update haystack/preview/document_stores/protocols.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/preview/document_stores/memory/document_store.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* docstring of Store
* adapt memorydocumentstore
* fix tests
* remove direct inheritance
* pylint
* Update haystack/preview/document_stores/mixins.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* test names
* revert suggestion
* private self._stores
* move asserts out
* remove protocols
* review feedback
* review feedback
* fix tests
* mypy
* review feedback
* fix tests & other details
* naming
* mypy
* fix tests
* typing
* partial review feedback
* move .store to input dataclass
* Revert "move .store to input dataclass"
This reverts commit 53f624b99f3414c89d5134711725b31bd94ef77a.
* disable reusing components with stores
* disable sharing components with docstores
* Update mixins.py
* black
* upgrade canals & fix tests
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-07-17 15:06:19 +02:00
Vladimir Blagojevic
adfabdd648
Improve token limit tests for OpenAI PromptNode layer ( #5351 )
2023-07-17 14:03:03 +02:00
Fanli Lin
9891bfeddd
fix: a small bug in StopWordsCriteria ( #5316 )
2023-07-13 15:58:06 +02:00
bogdankostic
237d67dbfd
feat: Check version of Elasticsearch server and add support for Elasticsearch <= 7.5 ( #5320 )
...
* Check ES server version + add support for ES <= 7.5
* Adapt comment
* PR feedback
2023-07-13 14:50:43 +02:00
Vladimir Blagojevic
f21005f8ea
refactor: Extract link retrieval from WebRetriever, introduce LinkContentRetriever ( #5227 )
...
* Extract link retrieval from WebRetriever, introduce LinkContentRetriever
* Add example
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-07-13 12:54:40 +02:00
MichelBartels
fd350bbb8f
fix: Run HFLocalInvocationLayer.supports even if inference packages are not installed ( #5308 )
...
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-07-13 12:52:56 +02:00
ZanSara
7848f00d01
feat: upgrade canals
in preview ( #5344 )
...
* upgrade nodes
* linting
2023-07-13 12:30:49 +02:00
Sebastian Husch Lee
b5aef24a7e
feat: Add support for meta fields that are lists when using embed_meta_fields ( #5307 )
...
* Add support for meta fields that are lists when using embed_meta_fields
* Make sure unit test doesn't download model
* Adding more unit tests
2023-07-11 17:32:33 +02:00
Stefano Fiorucci
6632505540
chore: deprecate SklearnQueryClassifier
( #5324 )
...
* pin scikit-learn, deprecate SklearnQueryClassifier
* rm scikit-learn pin
2023-07-11 17:07:23 +02:00
Sebastian Husch Lee
22750d342c
test: Refactor some retriever tests into unit tests ( #5306 )
...
* Modify and reactivate two unit tests
* Refactor openai embedding tests into unit tests
* Update test_retriever.py
* Changing tests
2023-07-11 13:36:23 +02:00
Fanli Lin
514f93a6eb
fix: num_return_sequences should be less than num_beams, not top_k ( #5280 )
...
* formatting
* remove top_k variable
* add pytest
* add numbers
* string formatting
* fix formatting
* revert
* extend tests with assertions for num_return_sequences
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-07-11 12:20:21 +02:00
bogdankostic
b7f683bfa4
ci: Add unit test for Elasticsearch8 ( #5300 )
...
* Add job for ES8 integration tests
* Add unit test for Elasticsearch 8
* Add tests.yml
* Adapt tests.yml
* Remove added white space
* Adapt tests.yml
* Adapt tests.yml
* Add dependencies to unit test name
* Adapt unit test matrix
* Adapt unit test matrix
* Adapt unit test matrix
* Adapt unit test matrix
* Update tests.yml
* Create separate tests where necessary
* Fix skip
* Adapt tests
2023-07-10 16:03:50 +02:00
tstadel
9acb275680
fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests ( #5113 )
...
* use _source on opensearch bulk requests
* fix label bulk requests
* add tests
* fix test
* apply feedback
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-07-07 15:12:50 +02:00
ZanSara
13bed30504
feat: batch mode for MemoryRetriever
(v2) ( #5287 )
...
* memoryretriever batch mode
* typing of output
2023-07-07 12:10:35 +02:00
ZanSara
f49bd3a12f
feat: introduce Store
protocol (v2) ( #5259 )
...
* add protocol and adapt pipeline
* review feedback & update tests
* pylint
* Update haystack/preview/document_stores/protocols.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/preview/document_stores/memory/document_store.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* docstring of Store
* adapt memorydocumentstore
* fix tests
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-07-07 12:10:08 +02:00
Stefano Fiorucci
90ff3817e7
feat: support OpenAI-Organization
for authentication ( #5292 )
...
* add openai_organization to invocation layer, generator and retriever
* added tests
2023-07-07 12:02:21 +02:00
bogdankostic
0697f5c63e
fix: Support isolated node eval in run_batch in Generators ( #5291 )
...
* Add isolated node eval to BaseGenerator's run_batch
* Add unit tests
2023-07-07 10:32:43 +02:00
MichelBartels
08f1865ddd
fix: Improve robustness of get_task HF pipeline invocations ( #5284 )
...
* replace get_task method and change invocation layer order
* add test for invocation layer order
* add test documentation
* make invocation layer test more robust
* fix type annotation
* change hf timeout
* simplify timeout mock and add get_task exception cause
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-07-06 16:33:44 +02:00
Vladimir Blagojevic
ac412193cc
refactor: Simplify selection of Azure vs OpenAI invocation layers ( #5271 )
2023-07-06 13:23:13 +02:00