24 Commits

Author SHA1 Message Date
Stefano Fiorucci
92a8704de4
mypy ignore specific errors (#6278) 2023-11-10 18:10:38 +01:00
Julian Risch
59e89b1031
test: Remove anthropic from "getting started" example test (#6024) 2023-10-12 22:36:49 +02:00
Nicola Procopio
c102b152dc
fix: Run update_embeddings in examples (#6008)
* added hybrid search example

Added an example about hybrid search for faq pipeline on covid dataset

* formatted with back formatter

* renamed document

* fixed

* fixed typos

* added test

added test for hybrid search

* fixed withespaces

* removed test for hybrid search

* fixed pylint

* commented logging

* updated hybrid search example

* release notes

* Update hybrid_search_faq_pipeline.py-815df846dca7e872.yaml

* Update hybrid_search_faq_pipeline.py

* mention hybrid search example in release notes

* reduce installed dependencies in examples test workflow

* do not install cuda dependencies

* skip models if API key not set; delete document indices

* skip models if API key not set; delete document indices

* skip models if API key not set; delete document indices

* keep roberta-base model and inference extra

* pylint

* disable pylint no-logging-basicconfig rule

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-10-10 16:38:52 +02:00
Timo Moeller
d048bb5352
docs: Add minimal getting started code to showcase haystack + RAG (#5578)
* init

* Change question

* Add TODO comment

* Addressing feedback

* Add local folder option. Move additional functions inside haystack.utils for easier imports

* Apply Daria's review suggestions

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Add integration test

* change string formatting

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Add outputparser to HF

* Exclude anthropic test

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-09-06 12:14:08 +02:00
Vladimir Blagojevic
6787ad2435
fix: Improve imports for new rankers (#5696)
* Proper imports for new rankers

* Small fix
2023-08-31 13:33:29 +02:00
Vladimir Blagojevic
2118f68769
feat: Add domain scoping to WebRetriever (#5587)
* WebSearch: add allowed_domains scoped search

* Add talk to website example

* Add release note

* Add allowed_domains to WebSearch

* Minor fix

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-28 20:02:02 +02:00
Vladimir Blagojevic
da67700318
Rename web_lfqa_improved and update questions (#5588) 2023-08-17 17:10:49 +02:00
Vladimir Blagojevic
a75b9dd4bb
feat: LinkContentFetcher - add content-type resolution, user agent switching, PDF handler (#5374)
* Add content type resolution, pdf handler, user agent switching
---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-09 18:14:04 +02:00
Vladimir Blagojevic
abc6737e63
feat: Improve LFQA Web Example (#5504)
* Improve web_lfqa example

* Turn off pylint for logging setup

* Another way to turn off logging
2023-08-04 14:20:06 +02:00
Vladimir Blagojevic
1876c41f07
feat: Add LostInTheMiddleRanker (#5457)
* Add lost in the middle ranker

* Add release note

* Julian's feedback: more precise version of truncate

* Better comments for the litm algorithm

* Sebastian PR feedback

* Add check for invalid values of word_count_threshold

* Remove _truncate as it is not needed any more

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-08-02 17:05:13 +02:00
Vladimir Blagojevic
40a2e9b56a
refactor: Update WebRetriever to use LinkContentFetcher (#5229)
* Refactor WebRetriever to use LinkContentFetcher

* PR feedback

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-08-02 12:45:03 +02:00
Vladimir Blagojevic
540d0fad97
feat: Add DiversityRanker (#5398)
* Introduce DiversityRanker

* improve most_diverse_order speed

* Compute mean for numerical stability

* Add release note

* Add cosine similarity 

* Test both dot product and cosine similarity

* Add pydocs hook

---------

Co-authored-by: Michel Bartels <login@michelbartels.com>
2023-08-01 12:48:34 +02:00
Nicola Procopio
8a2ab82651
feat: Added hybrid search example (#5376)
* added hybrid search example

Added an example about hybrid search for faq pipeline on covid dataset

* formatted with back formatter

* renamed document

* fixed

* fixed typos

* added test

added test for hybrid search

* fixed withespaces

* removed test for hybrid search

* fixed pylint

* commented logging
2023-07-24 12:54:21 +02:00
Vladimir Blagojevic
597df1414c
feat: Update Anthropic Claude support with the latest models, new streaming API, context window sizes (#5406)
* Update Claude support with the latest models, new streaming API, context window sizes

* Use Github Anthropic SDK link for tokenizer, revert _init_tokenizer

* Change example key name to ANTHROPIC_API_KEY
2023-07-21 13:33:07 +02:00
Vladimir Blagojevic
f21005f8ea
refactor: Extract link retrieval from WebRetriever, introduce LinkContentRetriever (#5227)
* Extract link retrieval from WebRetriever, introduce LinkContentRetriever

* Add example
---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-07-13 12:54:40 +02:00
Bilge Yücel
6a1b6b1ae3
feat: Update ConversationalAgent (#5065)
* feat: Update ConversationalAgent

* Add Tools
* Add test
* Change default params

* fix tests

* Fix circular import error
* Update conversational-agent prompt
* Add conversational-agent-without-tools to legacy list

* Add warning to add tools to conversational agent

* Add callable tools

* Add example script

* Fix linter errors

* Update ConversationalAgent depending on the existance of tools

* Initialize the base Agent with different arguments when there's tool
* Inject memory to the prompt in both cases, update prompts accordingly

* Override the add_tools method to prevent adding tools to ConversationalAgent without tools

* Update test

* Fix linter error

* Remove unused import

* Update docstrings and api reference

* Fix imports and doc string code snippet

* docstrings update

* Update conversational.py

* Mock PromptNode

* Prevent circular import error

* Add max_steps to the ConversationalAgent

* Update resolver description

* Add prompt_template as parameter

* Change docstring

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-06-20 13:09:21 +03:00
Vladimir Blagojevic
8d8de65492
Add AgentToolLogger, unit test, and example usage (#5087) 2023-06-15 08:43:20 +02:00
ZanSara
9612aa90bb
fix examples (#5041) 2023-05-29 15:15:38 +02:00
Vladimir Blagojevic
9d52998b25
feat: Add conversational agent (#4931) 2023-05-17 15:19:09 +02:00
Vladimir Blagojevic
8091ced8d5
refactor: Extract ToolsManager, add it to Agent by composition (#4794)
* Extract ToolsManager, add it to Agent by the composition
* PR feedback Massi
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-05-03 16:45:40 +02:00
Vladimir Blagojevic
3fefc475b4
fix: Deprecate Seq2SeqGenerator and RAGenerator (#4745)
* Deprecate Seq2SeqGenerator

* changed the warning to include suggestion

* Added example and msg to API reference docs

* Added RAG deprecation

* renamed name to adapt to naming conven

* update docstrings

---------

Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-04-26 13:59:35 +02:00
Vladimir Blagojevic
be25655663
feat: Add agent tools (#4437)
* Initial commit, add search_engine

* Add TopPSampler

* Add more TopPSampler unit tests

* Remove SearchEngineSampler (converted to TopPSampler)

* Add some basic WebSearch unit tests

* Rename unit tests

* Add WebRetriever into agent_tools

* Adjust to WebRetriever

* Add WebRetriever mode [snippet|document]

* Minor changes

* SerperDev: add peopleAlsoAsk search results

* First agent for hotpotqa

* Making WebRetriever work on hotpotqa

* refactor: minor WebRetriever improvements (#4377)

* refactor: remove doc ids rebuild + antecipate cache

* refactor: improve caching, fix Document ids

* Minor WebRetriever improvements

* Overlooked minor fixes

* feat: add Bing API as search engine

* refactor: let kwargs pass-through

* feat: increase search context

* check sampler result, improve batch typing

* refactor: increase mypy compliance

* Initial commit, add search_engine

* Add TopPSampler

* Add more TopPSampler unit tests

* Remove SearchEngineSampler (converted to TopPSampler)

* Add some basic WebSearch unit tests

* Rename unit tests

* Add WebRetriever into agent_tools

* Adjust to WebRetriever

* Add WebRetriever mode [snippet|document]

* Minor changes

* SerperDev: add peopleAlsoAsk search results

* First agent for hotpotqa

* Making WebRetriever work on hotpotqa

* refactor: minor WebRetriever improvements (#4377)

* refactor: remove doc ids rebuild + antecipate cache

* refactor: improve caching, fix Document ids

* Minor WebRetriever improvements

* Overlooked minor fixes

* feat: add Bing API as search engine

* refactor: let kwargs pass-through

* feat: increase search context

* check sampler result, improve batch typing

* refactor: increase mypy compliance

* Fix mypy

* Minor example fixes

* Fix the descriptions

* PR feedback updates

* More fixes

* TopPSampler: handle top p None value, add unit test

* Add top_k to WebSearch

* Use boilerpy3 instead trafilatura

* Remove date finding

* Add more WebRetriever docs

* Refactor long methods

* making the preprocessor optional

* hide WebSearch and make NeuralWebSearch a pipeline

* remove unused imports

* add WebQAPipeline and split example into two

* change example search engine to SerperDev

* Turn off progress bars in WebRetriever's PreProcesssor

* Agent tool examples - final updates

* Add webqa test, search results ranking scores

* Better answer box handling for SerperDev and SerpAPI

* Minor fixes

* pylint

* pylint fixes

* extract TopPSampler from WebRetriever

* use sampler only for WebRetriever modes other than snippet

* add web retriever tests

* add web retriever tests

* exclude rdflib@6.3.2 due to license issues

* add test for preprocessed docs and kwargs examples in docstrings

* Move test_webqa_pipeline to test/pipelines

* change docstring for join_documents_and_scores

* Use WebQAPipeline in examples/web_lfqa.py

* Use WebQAPipeline in examples/web_lfqa.py

* Move test_webqa_pipeline to e2e

* Updated lg

* Sampler added automatically in WebQAPipeline, no need to add it

* Updated lg

* Updated lg

* :ignore Update agent tools examples to new templates (#4503)

* Update examples to new templates

* Add print back

* fix linting and black format issues

---------

Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-03-27 18:14:58 +02:00
Massimiliano Pippi
5e0de4a9ed
do not run launch_es in the CI (#3981) 2023-01-27 16:43:17 +01:00
Tuana Celik
e1502c8029
Adding Example Scripts to Haystack (#3588)
* add 2 example scripts

* fixing faq script

* updating PR based on comments

* black

* updating s3 buckets

* first attempt at testing

* Add basic tests to two scripts

PR: #3588

* make tests runnable

* reformat files

* only run in PRs touching an example

Co-authored-by: bilgeyucel <bilgeyucel96@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-27 14:54:59 +01:00