Vladimir Blagojevic
540d0fad97
feat: Add DiversityRanker ( #5398 )
...
* Introduce DiversityRanker
* improve most_diverse_order speed
* Compute mean for numerical stability
* Add release note
* Add cosine similarity
* Test both dot product and cosine similarity
* Add pydocs hook
---------
Co-authored-by: Michel Bartels <login@michelbartels.com>
2023-08-01 12:48:34 +02:00
Malte Pietsch
8c017ccc32
Update installation instructions in README.md ( #5480 )
2023-08-01 12:33:40 +02:00
Silvano Cerza
bc152d953c
Skip running tests in CI when editing docs Python files ( #5482 )
2023-08-01 12:31:24 +02:00
Silvano Cerza
9a359101fd
chore: Rework docs generation ( #5481 )
...
* Change docs generation to use id for parent doc instead of slug
* Rename step
2023-08-01 12:18:33 +02:00
bogdankostic
a51ca19fe4
feat: Add TextFileToDocument component (v2) ( #5467 )
...
* Add TextfileToDocument component
* Add docstrings
* Add unit tests
* Add release note file
* Make use of progress bar
* Add TextfileToDocument to __init__.py
* Use lazy % formatting in logging functions
* Remove f from non-f-string
* Add TextfileToDocument to __init__.py
* Use correct dependency extra
* Compare file path against path object
* PR feedback
* PR feedback
* Update haystack/preview/components/file_converters/txt.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update docstrings
* Add error handling
* Add unit test
* Reintroduce falsely removed caplog
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-08-01 11:34:52 +02:00
Muhammad Bilal
8920fd6939
feat: add optional index selection for endpoints ( #5444 )
...
* add index selection
* reformatting
* updated test script
2023-08-01 10:47:46 +02:00
Bilge Yücel
62029ba441
Add AgentStep to api reference ( #5402 )
2023-07-31 19:26:34 +03:00
Stefano Fiorucci
6f534873a5
fix: restrict supports method in the OpenAI invocation layer and a similar method in the EmbeddingRetriever ( #5458 )
...
* restrict OpenAI supports method
* better note
* Update releasenotes/notes/restrict-openai-supports-method-fb126583e4beb057.yaml
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-07-31 13:14:22 +02:00
Massimiliano Pippi
d9fd1ab7bc
feat!: remove original files after indexing ( #5459 )
...
* remove original files after indexing
* fix tests
2023-07-31 13:07:16 +02:00
Massimiliano Pippi
5f01391827
add workflow to check presence of release notes ( #5449 )
2023-07-27 10:40:40 +02:00
Stefano Fiorucci
672813052d
Update invocation-layers.yml ( #5445 )
2023-07-26 15:39:08 +02:00
Vladimir Blagojevic
409e3471cb
feat: Enable Support for Meta LLama-2 Models in Amazon Sagemaker ( #5437 )
...
* Enable Support for Meta LLama-2 Models in Amazon Sagemaker
* Improve unit test for invocation layers positioning
* Small adjustment, add more unit tests
* mypy fixes
* Improve unit tests
* Update test/prompt/invocation_layer/test_sagemaker_meta.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* PR feedback
* Add pydocs for newly extracted methods
* simplify is_proper_chat_*
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2023-07-26 15:26:39 +02:00
Silvano Cerza
9ab6298f1d
build: Unpin mlflow, constraint dulwich and botocore ( #5441 )
...
* Unpin mlflow
* Pin dulwich
* Pin botocore
2023-07-26 12:59:16 +02:00
Silvano Cerza
7940ec0482
Add @store decorator ( #5438 )
2023-07-26 09:32:23 +02:00
Vladimir Blagojevic
22897c17a2
fix:Improve log warnings in REST API /health endpoint ( #5381 )
...
* Improve warning in REST APIs get_health_status method
* Convert log message
* A better solution and documentation
* Add another nested try/except block
* Simplify
2023-07-25 17:06:03 +02:00
Julian Risch
5bb0a1f57a
Revert "fix: num_return_sequences should be less than num_beams, not top_k ( #5280 )" ( #5434 )
...
This reverts commit 514f93a6eb575d376b21d22e32080fac62cf785f.
2023-07-25 13:27:41 +02:00
Sebastian Husch Lee
2bc7fe1a08
test: reactivate unit tests in test_eval.py ( #5255 )
...
* Activate tests that follow unit test and integration test rules
* Adding more integration labels
* Change name to better reflect complexity of test
* Remove mark integration tags, move test to doc store test for add_eval_data
* Removing incorrect integration label
* Deactivated document store test b/c it fails for Weaviate and pinecone
* Remove unit label since test needs to be refactored to be considered a unit test
* Undo changes
* Undo change
* Check every field in the load evaluation result
* Add back label and add skip reason
* Use pytest skip instead of TODO
2023-07-24 17:07:45 +02:00
Massimiliano Pippi
363f3edbf7
feat: add reno to manage release notes ( #5397 )
...
* first draft
* add release notes
* remove old settings
* add reno usage instructions
* page the docs team when release notes are added
* add reno to the dev dependencies
* Apply suggestions from code review
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-07-24 17:02:46 +02:00
github-actions[bot]
afabc785c3
Update unstable version ( #5424 )
...
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2023-07-24 16:59:49 +02:00
bogdankostic
345dbeb638
docs: Add Elasticsearch to API config ( #5422 )
2023-07-24 16:23:13 +02:00
Nicola Procopio
8a2ab82651
feat: Added hybrid search example ( #5376 )
...
* added hybrid search example
Added an example about hybrid search for faq pipeline on covid dataset
* formatted with back formatter
* renamed document
* fixed
* fixed typos
* added test
added test for hybrid search
* fixed withespaces
* removed test for hybrid search
* fixed pylint
* commented logging
2023-07-24 12:54:21 +02:00
Julian Risch
f38f365682
fix: Error message about weight param in RecentnessRanker ( #5409 )
...
* fix: error message about weight param in RecentnessRanker
* trigger GitHub actions
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2023-07-24 10:41:17 +02:00
Vladimir Blagojevic
597df1414c
feat: Update Anthropic Claude support with the latest models, new streaming API, context window sizes ( #5406 )
...
* Update Claude support with the latest models, new streaming API, context window sizes
* Use Github Anthropic SDK link for tokenizer, revert _init_tokenizer
* Change example key name to ANTHROPIC_API_KEY
2023-07-21 13:33:07 +02:00
Stefano Fiorucci
1706b662db
build: upgrade transformers to v4.31.0 ( #5391 )
...
* Update transformers
* fix the forgotten pin
2023-07-21 09:30:03 +02:00
Massimiliano Pippi
a13ffcf9df
bump pydoc-markdown ( #5405 )
2023-07-20 16:48:08 +02:00
elundaeva
612c6779fb
feat: RecentnessRanker ( #5301 )
...
* recency reranker code
* removed
* readd
* edited code
* edit
* mypy test fix
* adding warnings for score method
* fix
* fix
* adding paper link
* comments implementation
* change to predict and predict_batch
* change to predict and predict_batch 2
* adding unit test
* fixes
* small fixes
* fix for unit test
* table driven test
* small fixes
* small fixes2
* adding predict_batch tests
* add recentness_ranker to api reference docs
* implementing feedback
* implementing feedback2
* implementing feedback3
* implementing feedback4
* implementing feedback5
* remove document_map, remove final check if score is not None
* add final check if doc score is not None for mypy
---------
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-07-20 16:20:45 +02:00
bogdankostic
c2506866bd
docs: Pin PyYAML to 5.3.1 ( #5400 )
2023-07-20 15:31:58 +02:00
Julian Risch
eeb29b5686
test: Re-activate end-to-end tests workflow ( #5343 )
...
* Install haystack with required extras
* remove whitespaces
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Add sleep
* Add s for seconds
* Move container initialization in workflow
* Update e2e.yml
add nightly run
* use new folder for initial e2e test
* use file hash for caching and trigger on push to branch
* remove \n from model names read from file
* remove trigger on push to branch
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-07-20 11:48:51 +02:00
Sebastian Husch Lee
f7642e83ea
feat: Add embed_meta_fields to Ranker nodes ( #5361 )
...
* Adding embed_meta_fields to ranker nodes
* Fix tests by adding case where embed_meta_fields=None
* Adding unit test for _add_meta_fields_to_docs
* Fix pylint
* Add unit test
* Added another unit test. Caught a bug.
* Adding more unit tests
* Add unit test
* Updating some older tests into unit tests using mocking
* Convert another test to unit test
* Test run method
* One last unit test
2023-07-18 09:11:51 +02:00
elundaeva
e0cf1421c6
proposal: Add RecentnessRanker component ( #5289 )
...
proposal for adding Recentness Ranker to Haystack
2023-07-17 16:33:47 +02:00
Fanli Lin
09a1d3c0dc
remove duplicate ( #5368 )
2023-07-17 16:24:02 +02:00
ZanSara
8f3fe85878
feat: extend pipeline.add_component to support stores ( #5261 )
...
* add protocol and adapt pipeline
* change API in pipeline.add_component
* adapt pipeline tests
* adapt memoryretriever
* additional checks
* separate protocol and mixin
* review feedback & update tests
* pylint
* Update haystack/preview/document_stores/protocols.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/preview/document_stores/memory/document_store.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* docstring of Store
* adapt memorydocumentstore
* fix tests
* remove direct inheritance
* pylint
* Update haystack/preview/document_stores/mixins.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update test/preview/components/retrievers/test_memory_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* test names
* revert suggestion
* private self._stores
* move asserts out
* remove protocols
* review feedback
* review feedback
* fix tests
* mypy
* review feedback
* fix tests & other details
* naming
* mypy
* fix tests
* typing
* partial review feedback
* move .store to input dataclass
* Revert "move .store to input dataclass"
This reverts commit 53f624b99f3414c89d5134711725b31bd94ef77a.
* disable reusing components with stores
* disable sharing components with docstores
* Update mixins.py
* black
* upgrade canals & fix tests
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-07-17 15:06:19 +02:00
Vladimir Blagojevic
adfabdd648
Improve token limit tests for OpenAI PromptNode layer ( #5351 )
2023-07-17 14:03:03 +02:00
Ikko Eltociear Ashimine
35b2c99f43
chore: fix typo in base.py ( #5356 )
...
paramters -> parameters
2023-07-13 18:40:21 +02:00
Fanli Lin
9891bfeddd
fix: a small bug in StopWordsCriteria ( #5316 )
2023-07-13 15:58:06 +02:00
bogdankostic
237d67dbfd
feat: Check version of Elasticsearch server and add support for Elasticsearch <= 7.5 ( #5320 )
...
* Check ES server version + add support for ES <= 7.5
* Adapt comment
* PR feedback
2023-07-13 14:50:43 +02:00
Daria Fokina
63fd63ff23
fix: update WebRetriever docstrings and default mode ( #5352 )
...
* update WebRetriever docstrings
* snippets as default mode
2023-07-13 13:57:52 +02:00
Vladimir Blagojevic
f21005f8ea
refactor: Extract link retrieval from WebRetriever, introduce LinkContentRetriever ( #5227 )
...
* Extract link retrieval from WebRetriever, introduce LinkContentRetriever
* Add example
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-07-13 12:54:40 +02:00
MichelBartels
fd350bbb8f
fix: Run HFLocalInvocationLayer.supports even if inference packages are not installed ( #5308 )
...
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-07-13 12:52:56 +02:00
ZanSara
7848f00d01
feat: upgrade canals in preview ( #5344 )
...
* upgrade nodes
* linting
2023-07-13 12:30:49 +02:00
Stefano Fiorucci
8750d92763
pin scikit-learn>=1.3.0 ( #5322 )
2023-07-13 11:11:28 +02:00
Daria Fokina
a152a812da
create invocation-layers API reference page ( #5262 )
...
* create invocation-layers.yml
* Update invocation-layers.yml
2023-07-12 17:48:27 +02:00
Sebastian Husch Lee
b5aef24a7e
feat: Add support for meta fields that are lists when using embed_meta_fields ( #5307 )
...
* Add support for meta fields that are lists when using embed_meta_fields
* Make sure unit test doesn't download model
* Adding more unit tests
2023-07-11 17:32:33 +02:00
Stefano Fiorucci
6632505540
chore: deprecate SklearnQueryClassifier ( #5324 )
...
* pin scikit-learn, deprecate SklearnQueryClassifier
* rm scikit-learn pin
2023-07-11 17:07:23 +02:00
Sebastian Husch Lee
22750d342c
test: Refactor some retriever tests into unit tests ( #5306 )
...
* Modify and reactivate two unit tests
* Refactor openai embedding tests into unit tests
* Update test_retriever.py
* Changing tests
2023-07-11 13:36:23 +02:00
Fanli Lin
514f93a6eb
fix: num_return_sequences should be less than num_beams, not top_k ( #5280 )
...
* formatting
* remove top_k variable
* add pytest
* add numbers
* string formatting
* fix formatting
* revert
* extend tests with assertions for num_return_sequences
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-07-11 12:20:21 +02:00
bogdankostic
41668f26d6
ci: Update labeler.yml to account for Elasticsearch changes ( #5318 )
2023-07-11 11:51:49 +02:00
Sebastian Husch Lee
2703c2d483
docs: Small documentation updates to dense.py ( #5305 )
...
* Small documentation updates
* Update doc strings
2023-07-10 18:16:49 +02:00
bogdankostic
b7f683bfa4
ci: Add unit test for Elasticsearch8 ( #5300 )
...
* Add job for ES8 integration tests
* Add unit test for Elasticsearch 8
* Add tests.yml
* Adapt tests.yml
* Remove added white space
* Adapt tests.yml
* Adapt tests.yml
* Add dependencies to unit test name
* Adapt unit test matrix
* Adapt unit test matrix
* Adapt unit test matrix
* Adapt unit test matrix
* Update tests.yml
* Create separate tests where necessary
* Fix skip
* Adapt tests
2023-07-10 16:03:50 +02:00
bogdankostic
048fc7f640
ci: Add job for ES8 integration tests ( #5297 )
...
* Add job for ES8 integration tests
* Remove whitespace
* Fix filename
* Add tests.yml
* Revert "Add tests.yml"
This reverts commit ec12654d4e146b5ef6cba04ad82f5973935d8520.
2023-07-10 10:43:05 +02:00