Stefano Fiorucci
2b3c77e41d
fix: make JoinDocuments
correctly handle duplicate documents w null scores ( #6261 )
...
* fix error with null values
* release note
* simplify
2023-11-09 14:28:56 +01:00
Massimiliano Pippi
789e524de3
remove leftovers from 1.18 ( #6196 )
2023-10-30 11:25:54 +01:00
Nicola Procopio
32e87d37c1
fixed join_docs.py concatenate ( #5970 )
...
* added hybrid search example
Added an example about hybrid search for faq pipeline on covid dataset
* formatted with back formatter
* renamed document
* fixed
* fixed typos
* added test
added test for hybrid search
* fixed withespaces
* removed test for hybrid search
* fixed pylint
* commented logging
* fixed bug in join_docs.py _concatenate_results
* Update join_docs.py
updated comment
* format with black
* added releasenote on PR
* updated release notes
* updated test_join_documents
* updated test
* updated test
* Update test_join_documents.py
* formatted with black
* fixed test
* fixed
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-10-16 09:31:52 +02:00
Stefano Fiorucci
cc70b4b613
deprecation ( #5954 )
2023-10-03 12:48:06 +02:00
Christian Clauss
bf6d306d68
ci: Simplify Python code with ruff rules SIM ( #5833 )
...
* ci: Simplify Python code with ruff rules SIM
* Revert #5828
* ruff --select=I --fix haystack/modeling/infer.py
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-09-20 08:32:44 +02:00
Christian Clauss
91ab90a256
perf: Python performance improvements with ruff C4 and PERF fixes ( #5803 )
...
* Python performance improvements with ruff C4 and PERF
* pre-commit fixes
* Revert changes to examples/basic_qa_pipeline.py
* Revert changes to haystack/preview/testing/document_store.py
* revert releasenotes
* Upgrade to ruff v0.0.290
2023-09-16 16:26:07 +02:00
Christian Clauss
1bc03ddc73
ci: Fix all ruff pyflakes errors except unused imports ( #5820 )
...
* ci: Fix all ruff pyflakes errors except unused imports
* Delete releasenotes/notes/fix-some-pyflakes-errors-69a1106efa5d0203.yaml
2023-09-15 18:30:33 +02:00
Christian Clauss
9405eb90ee
ci: Fix invalid escape sequences in Python code ( #5802 )
...
* ci: Use ruff in pre-commit to further limit complexity
* Fix invalid escape sequences in Python code
* Delete releasenotes/notes/ruff-4d2504d362035166.yaml
2023-09-14 16:42:48 +02:00
Julian Risch
4ae0924ea0
feat!: Remove SklearnQueryClassifier ( #5779 )
...
* remove SklearnQueryClassifier
* reno
2023-09-13 12:55:33 +02:00
bogdankostic
07c85905f3
fix: Change use_auth_token to token in TransformersQueryClassifier ( #5659 )
2023-08-29 15:21:25 +02:00
Vladimir Blagojevic
f13b37db24
fix: LinkContentFetcher - when no content retrieved (i.e. request blocked), default to snippet text ( #5656 )
...
* When no content retrieved (i.e. request blocked), default to snippet
* Add release note
2023-08-29 10:57:47 +02:00
Vladimir Blagojevic
46c9139caf
refactor: Rework WebRetriever caching, adjust tests ( #5566 )
...
* Rework WebRetriever caching, adjust tests
* Add release note
* Better pydocs
* Minor improvements
* Update haystack/nodes/retriever/web.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-16 17:41:11 +02:00
Vladimir Blagojevic
a75b9dd4bb
feat: LinkContentFetcher - add content-type resolution, user agent switching, PDF handler ( #5374 )
...
* Add content type resolution, pdf handler, user agent switching
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-09 18:14:04 +02:00
Vladimir Blagojevic
84ed954c8c
feat: Improve performance and add default media support in FileTypeClassifier ( #5083 )
...
* feat: add media outgoing edge to FileTypeClassifier
* Add release note
* Update language
---------
Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-08 15:51:07 +02:00
tstadel
d46c84bb61
feat: support dynamic filters in custom_query ( #5427 )
...
* support filters in custom_query
* better tests
* Update docstrings
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-08 15:48:15 +02:00
Stefano Fiorucci
3f472995bb
refactor: update Crawler to support selenium>=4.11.0 and simplify it ( #5515 )
...
* refactor crawler
* rm unused imports
* release notes!
* rm outdated mock
2023-08-08 15:13:22 +02:00
Vladimir Blagojevic
1876c41f07
feat: Add LostInTheMiddleRanker ( #5457 )
...
* Add lost in the middle ranker
* Add release note
* Julian's feedback: more precise version of truncate
* Better comments for the litm algorithm
* Sebastian PR feedback
* Add check for invalid values of word_count_threshold
* Remove _truncate as it is not needed any more
---------
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-08-02 17:05:13 +02:00
Vladimir Blagojevic
0efe0ee7b3
feat: Add top_k
parameter to DiversityRanker
init method ( #5494 )
...
* Add top_k
* Add release note
2023-08-02 17:04:04 +02:00
Vladimir Blagojevic
40a2e9b56a
refactor: Update WebRetriever to use LinkContentFetcher ( #5229 )
...
* Refactor WebRetriever to use LinkContentFetcher
* PR feedback
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-08-02 12:45:03 +02:00
Vladimir Blagojevic
540d0fad97
feat: Add DiversityRanker ( #5398 )
...
* Introduce DiversityRanker
* improve most_diverse_order speed
* Compute mean for numerical stability
* Add release note
* Add cosine similarity
* Test both dot product and cosine similarity
* Add pydocs hook
---------
Co-authored-by: Michel Bartels <login@michelbartels.com>
2023-08-01 12:48:34 +02:00
elundaeva
612c6779fb
feat: RecentnessRanker ( #5301 )
...
* recency reranker code
* removed
* readd
* edited code
* edit
* mypy test fix
* adding warnings for score method
* fix
* fix
* adding paper link
* comments implementation
* change to predict and predict_batch
* change to predict and predict_batch 2
* adding unit test
* fixes
* small fixes
* fix for unit test
* table driven test
* small fixes
* small fixes2
* adding predict_batch tests
* add recentness_ranker to api reference docs
* implementing feedback
* implementing feedback2
* implementing feedback3
* implementing feedback4
* implementing feedback5
* remove document_map, remove final check if score is not None
* add final check if doc score is not None for mypy
---------
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-07-20 16:20:45 +02:00
Sebastian Husch Lee
f7642e83ea
feat: Add embed_meta_fields to Ranker nodes ( #5361 )
...
* Adding embed_meta_fields to ranker nodes
* Fix tests by adding case where embed_meta_fields=None
* Adding unit test for _add_meta_fields_to_docs
* Fix pylint
* Add unit test
* Added another unit test. Caught a bug.
* Adding more unit tests
* Add unit test
* Updating some older tests into unit tests using mocking
* Convert another test to unit test
* Test run method
* One last unit test
2023-07-18 09:11:51 +02:00
Vladimir Blagojevic
f21005f8ea
refactor: Extract link retrieval from WebRetriever, introduce LinkContentRetriever ( #5227 )
...
* Extract link retrieval from WebRetriever, introduce LinkContentRetriever
* Add example
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-07-13 12:54:40 +02:00
Sebastian Husch Lee
b5aef24a7e
feat: Add support for meta fields that are lists when using embed_meta_fields ( #5307 )
...
* Add support for meta fields that are lists when using embed_meta_fields
* Make sure unit test doesn't download model
* Adding more unit tests
2023-07-11 17:32:33 +02:00
Stefano Fiorucci
6632505540
chore: deprecate SklearnQueryClassifier
( #5324 )
...
* pin scikit-learn, deprecate SklearnQueryClassifier
* rm scikit-learn pin
2023-07-11 17:07:23 +02:00
Sebastian Husch Lee
22750d342c
test: Refactor some retriever tests into unit tests ( #5306 )
...
* Modify and reactivate two unit tests
* Refactor openai embedding tests into unit tests
* Update test_retriever.py
* Changing tests
2023-07-11 13:36:23 +02:00
Stefano Fiorucci
90ff3817e7
feat: support OpenAI-Organization
for authentication ( #5292 )
...
* add openai_organization to invocation layer, generator and retriever
* added tests
2023-07-07 12:02:21 +02:00
bogdankostic
0697f5c63e
fix: Support isolated node eval in run_batch in Generators ( #5291 )
...
* Add isolated node eval to BaseGenerator's run_batch
* Add unit tests
2023-07-07 10:32:43 +02:00
Massimiliano Pippi
c068e34954
Remove deprecated param return_table_cell
( #5218 )
...
* remove deprecated param
* Update haystack/nodes/reader/table.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* try
* remove unused functions and ignore mypy error
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-06-27 16:14:29 +02:00
bogdankostic
82291b56ad
fix: Send batches of query-doc pairs to inference_from_objects ( #5125 )
...
* Send batches of query-doc pairs to inference_from_objects
* Use absolute import path
* Add separate preprocessing_batch_size parameter
2023-06-26 14:26:26 +02:00
Sebastian
f1932492f1
feat: Add CohereRanker node using Cohere reranking endpoint ( #5152 )
...
* Started to add CohereRanker node
* Small refactoring of SentenceTransformersRanker node
* Started to add predict_batch method
* Simplified predict_batch code
* Added missing imports
* Undoing a change
* Fix mypy
* Adding unit tests using mocking
* Updated truncation warning message.
* Update doc strings
* Update to docs
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Updating docs to reflect PR discussion
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-06-23 16:46:46 +02:00
ZanSara
31664627eb
feat: hard document length limit at max_chars_check
( #5191 )
...
* implement hard cut at max_chars_check
* regenerate ids
* black
* docstring
* black
2023-06-23 12:34:19 +02:00
ZanSara
36192eca72
feat: current_datetime
shaper function ( #5195 )
...
* current_datetime shaper
* explicitly add current_datetime to the functions allowed in a prompt template
2023-06-23 10:33:34 +02:00
Sebastian
1602f3abdd
test: Adding unit tests to Ranker ( #5167 )
...
* adding unit tests for sentence transformers ranker
* Adding more unit tests
* Remove empty line
* Undo static method
* Revert change
* Updated indentation and added match message
* Remove unneeded paranthesis
2023-06-22 15:23:23 +02:00
Stefano Fiorucci
637433841e
chore: remove deprecated Seq2SeqGenerator
and RAGenerator
( #5180 )
...
* first draft of removal
* more removals
* don't download unused models
2023-06-21 16:38:45 +02:00
ZanSara
65cdf36d72
chore: block all HTTP requests in CI ( #5088 )
2023-06-13 14:52:24 +02:00
Vladimir Blagojevic
0cc9ce7522
fix: WebRetriever top_k is ignored in a pipeline ( #5106 )
...
* Initial changes
* Add WebSearch, WebRetriever top_k unit tests
* Add exact integration test that failed Tuana
* PR review
2023-06-09 10:42:37 +02:00
Sebastian
1777b22fcb
fix: Ensure eval mode for farm and transformer models for predictions ( #3791 )
...
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-06-06 13:06:30 +02:00
Michael Feil
6ea8ae01a2
feat: Allow setting custom api_base for OpenAI nodes ( #5033 )
...
* add changes for api_base
* format retriever
* Update haystack/nodes/retriever/dense.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/audio/whisper_transcriber.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/preview/components/audio/whisper_remote.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/answer_generator/openai.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update test_retriever.py
* Update test_whisper_remote.py
* Update test_generator.py
* Update test_retriever.py
* reformat with black
* Update haystack/nodes/prompt/invocation_layer/chatgpt.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Add unit tests
* apply docstring suggestions
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: michaelfeil <me@michaelfeil.eu>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-06-05 11:32:06 +02:00
Massimiliano Pippi
929b8d1fb0
ci: run Elasticsearch 8.6 in compatibility mode ( #3853 )
...
* bump ES version in CI
disable ssl
wait for service to start
set env vars
do not use choco to install ES
re-enable jobs deps
skip test on windows CI because of OOM
allocate more memory for ES
uniform ES installation and use default heap size
skip tests causing OOM
increase job timeout
restore memory limit for ES8
* Use latest elasticsearch version
2023-05-24 18:53:54 +02:00
Massimiliano Pippi
68924161df
chore: remove deprecated node PDFToTextOCRConverter ( #4982 )
...
* remove deprecated node
* remove related test
2023-05-23 16:55:54 +02:00
ZanSara
949b1b63b3
PromptHub integration in PromptNode
( #4879 )
...
* initial integration
* upgrade of prompthub
* fix get_prompt_template
* feedback
* add prompthub-py to dependencies
* tests
* mypy
* stray changes
* review feedback
* missing init
* fix test
* move logic in prompttemplate
* linting
* bugfixes
* fix unit tests
* fix cache
* simplify prompttemplate init
* remove unused function
* removing wrong params
* try remove all instances of prompt names
* more tests
* fix agent tests
* more tests
* fix tests
* pylint
* comma
* black
* fix test
* docstring
* review feedback
* review feedback
* fix mocks
* mypy
* fix mocks
* fix reference to missing templates
* feedback
* remove direct references to default template var
* tests
* Update haystack/nodes/prompt/prompt_node.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-23 15:22:58 +02:00
Massimiliano Pippi
c6ea542b57
chore: remove BaseKnowledgeGraph ( #4953 )
...
* remove BaseKnowledgeGraph
* fix pylint
2023-05-21 10:42:02 +02:00
Massimiliano Pippi
4974bf7ab3
chore: remove deprecated MilvusDocumentStore ( #4951 )
...
* remove deprecated MilvusDocumentStore
* remove leftovers
* fix pylint
2023-05-19 16:37:38 +02:00
Vladimir Blagojevic
5d7ee2e5e6
feat: Add max_tokens to BaseGenerator params ( #4168 )
...
* Add max_tokens to BaseGenerator params
* Make mypy happy
* Rebase and resolve conflicts
* Fix signature issues
* Update lg
* Add a mocked unit test method
* end-of-file-fixer corrected file
* Convert to unit test
* Mark test as integration
* make the test unit
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-05-18 15:19:29 +02:00
Massimiliano Pippi
3ea784464a
add test case for #4929 ( #4936 )
2023-05-18 09:12:03 +02:00
bogdankostic
df46e7fadd
fix: Use AutoTokenizer
instead of DPR specific tokenizer ( #4898 )
...
* Use AutoTokenizer instead of DPR specific tokenizer
* Adapt TableTextRetriever
* Adapt tests
* Adapt tests
2023-05-17 18:54:34 +02:00
Stefano Fiorucci
6e0000732d
feat: add BLIP support in TransformersImageToText
( #4912 )
...
* add blip support
* fix typo
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-16 10:57:41 +02:00
bogdankostic
5b2ef2afd6
Revert "refactor!: Deprecate name
param in PromptTemplate
and introduce template_name
instead ( #4810 )" ( #4834 )
...
This reverts commit f660f41c0615e6b3064ef3e321f1e5a295fafc1b.
2023-05-08 11:31:04 +02:00
ZanSara
6e982e9283
fix: preserve root_node
in JoinNode
's output ( #4820 )
...
* preserve root_node and add tests
* Added if statement to fix failing tests
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2023-05-08 10:17:36 +02:00