50 Commits

Author SHA1 Message Date
Silvano Cerza
1ce12c7a6a
Remove example (#7458) 2024-04-03 14:27:43 +02:00
Silvano Cerza
5aee378baf
chore: Remove all examples and point to cookbooks repo (#7350)
* Remove all examples and point to cookbooks repo

* Remove workflow testing examples
2024-03-12 18:04:39 +01:00
Daniel Barker
e4f37e9460
Fixed pipeline import statement (#7348) 2024-03-12 15:12:35 +01:00
Sebastian Husch Lee
ceda4cd655
feat: Add support for device_map (#6679)
* Getting device_map working to support 8bit loading and multi device inference

* Update to take account the device specified by the user

* add release notes

* Add device_map support for ExtractiveReader

* Update test

* Update to model that doesn't have issues

* Update test

* Update pytest approx

* Update release notes

* Start supporting device map

* Update ExtractiveReader to use new ComponentDevice

* Update similarity ranker to follow extractive reader implementation

* Fixing pylint

* Make mypy mostly happy

* Add new unit test to test device_map

* Adding unit tests

* Some refactoring

* Add more tests

* Add more tests

* Add another unit test

* Update first_device property to return a ComponentDevice to be able to use the to methods

* Updating tests for test_device

* Update tests and now explicitly modify device_map in model_kwargs

* Update haystack/utils/hf.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Make mypy happy

* mypy

* Remove unneeded optional flag

* Update ExtractiveReader with new logic

* Update ranker to follow new logic

* Removing unneeded code

* Make mypy happy

* fxi pylint

* Fix test

* Adding unit tests for device_map="auto"

* Add unit tests for ranker

* PR comments

* Make util method

* Adding unit tests

* Fix type annotation

* Fix pylint

* Fix test

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-01-30 13:47:57 +01:00
Vladimir Blagojevic
c47b82c54f
Remove pipeline_utils package and dependent code (#6806) 2024-01-23 18:40:43 +01:00
ZanSara
288ed150c9
feat!: Rename model_name or model_name_or_path to model in all Embedder classes (#6733)
* rename model parameter in the openai doc embedder

* fix tests for openai doc embedder

* rename model parameter in the openai text embedder

* fix tests for openai text embedder

* rename model parameter in the st doc embedder

* fix tests for st doc embedder

* rename model parameter in the st backend

* fix tests for st backend

* rename model parameter in the st text embedder

* fix tests for st text embedder

* fix docstring

* fix pipeline utils

* fix e2e

* reno

* fix the indexing pipeline _create_embedder function

* fix e2e eval rag pipeline

* pytest
2024-01-12 15:30:17 +01:00
ZanSara
79d67b0338
expand example to use bytestream (#6718) 2024-01-11 12:04:25 +01:00
Massimiliano Pippi
e1ec4e5e4d
refact!: Remove symbols under the haystack.document_stores namespace (#6714)
* remove symbols under the haystack.document_stores namespace

* Update haystack/document_stores/types/protocol.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* fix

* same for retrievers

* leftovers

* more leftovers

* add relnote

* leftovers

* one more

* fix examples

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-01-10 21:20:42 +01:00
ZanSara
9fe80fd225
feat: Add example script about routing metadata to converters in indexing pipelines (#6702)
* support single metadata dict in markdown2document

* reno

* unwrap list

* direct key access

* typing

* add example of indexing pipeline using Multiplexer

* reno
2024-01-09 14:59:22 +01:00
Massimiliano Pippi
93b2aaee09
chore: move DocumentJoiner to new joiners package (#6692)
* move DocumentJoiner to new joiners package

* relnote

* leftovers

* fix docstrings generation

* fix unrelated pydoc misconfiguration

* more unrelated work, yay!

* fix assertions
2024-01-08 22:06:27 +01:00
Stefano Fiorucci
c773c30c66
refactor!: rename all remaining metadata to meta (#6650)
* change metadata to meta

* release note
2023-12-28 12:18:15 +01:00
Vladimir Blagojevic
506ab81d26
chore: Rename GPT generators, deprecate old names (#6626) 2023-12-22 19:37:29 +01:00
Stefano Fiorucci
7cc6080dfa
chore: replace metadata w meta in tests/examples (#6612)
* replace metadata w meta in tests/examples

* do not touch already broken e2e tests

* Revert "do not touch already broken e2e tests"

This reverts commit 1f911920d98954b57daacfe8d8ed02fd77d136db.
2023-12-21 14:09:31 +01:00
ZanSara
ae5297bfd7
example: self-correcting loop for RAG (#6420)
* add example

* docstrings

* reno

* use condrouter

* move functions

* tests

* reno

* add component

* reno

* add tests

* mypy

* pylint

* logger

* module name

* multiplexer

* draw

* query_multiplexer

* reno

* typo
2023-12-20 11:35:05 +01:00
Vladimir Blagojevic
628e8aa3d4
feat: Improve getting started examples (#6510)
* Improve rag and indexing pipelines

* Update examples

* Simplify user interface and code, improve embedder model

* Improve default vals for embedder

* resolve typing

* resolve typing 2

* Fix unit test

---------

Co-authored-by: Timo Möller <timo.moeller@deepset.ai>
2023-12-09 19:01:13 +01:00
Vladimir Blagojevic
008a322023
feat: Add Indexing Pipeline (#6424)
* Add build_indexing_pipeline utils function

* Pylint fixes

* Move into another package to avoid circular deps

* Revert change

* Revert haystack/utils/__init__.py change

* Add example

* Use DocumentStore type, remove typing checks
2023-12-04 16:08:53 +01:00
ZanSara
a38f871dbd
feat: Add RAG pipeline (#6461)
* add rag pipeline

* Update examples/getting_started/rag.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-12-04 15:25:29 +01:00
Julian Risch
19ff30217c
docs: Add RAG pipeline example (#6446) 2023-11-30 14:38:15 +01:00
Massimiliano Pippi
00e1dd6eb8
chore: rearrange the core package, move tests and clean up (#6427)
* rearrange code

* fix tests

* relnote

* merge test modules

* remove extra

* rearrange draw tests

* forgot

* remove unused import
2023-11-28 09:58:56 +01:00
Julian Risch
c3a5d0d32f
docs: Add indexing example (#6412)
* docs: Add indexing example

* use Path for current directory
2023-11-27 18:44:44 +01:00
Silvano Cerza
db759b0717
Add black step when testing examples (#6425) 2023-11-27 15:01:33 +01:00
Malte Pietsch
09b4f53ce5
docs: Add example for loop in pipeline to autocorrect JSON (#6418)
* add example for pipeline loop

* add pydantic to CI

* Fix comment

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-27 13:29:16 +01:00
Massimiliano Pippi
9a8bef63c9
move snippets up one folder 2023-11-24 15:54:23 +01:00
Silvano Cerza
e6637f5ec2 Fix all tests 2023-11-24 14:48:43 +01:00
Massimiliano Pippi
09e7831f60
clean up 1.x code
---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:47:47 +01:00
Timo Moeller
b34c35d982
initial (#6355) 2023-11-23 10:32:54 +01:00
Stefano Fiorucci
92a8704de4
mypy ignore specific errors (#6278) 2023-11-10 18:10:38 +01:00
Julian Risch
59e89b1031
test: Remove anthropic from "getting started" example test (#6024) 2023-10-12 22:36:49 +02:00
Nicola Procopio
c102b152dc
fix: Run update_embeddings in examples (#6008)
* added hybrid search example

Added an example about hybrid search for faq pipeline on covid dataset

* formatted with back formatter

* renamed document

* fixed

* fixed typos

* added test

added test for hybrid search

* fixed withespaces

* removed test for hybrid search

* fixed pylint

* commented logging

* updated hybrid search example

* release notes

* Update hybrid_search_faq_pipeline.py-815df846dca7e872.yaml

* Update hybrid_search_faq_pipeline.py

* mention hybrid search example in release notes

* reduce installed dependencies in examples test workflow

* do not install cuda dependencies

* skip models if API key not set; delete document indices

* skip models if API key not set; delete document indices

* skip models if API key not set; delete document indices

* keep roberta-base model and inference extra

* pylint

* disable pylint no-logging-basicconfig rule

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-10-10 16:38:52 +02:00
Timo Moeller
d048bb5352
docs: Add minimal getting started code to showcase haystack + RAG (#5578)
* init

* Change question

* Add TODO comment

* Addressing feedback

* Add local folder option. Move additional functions inside haystack.utils for easier imports

* Apply Daria's review suggestions

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Add integration test

* change string formatting

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Add outputparser to HF

* Exclude anthropic test

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-09-06 12:14:08 +02:00
Vladimir Blagojevic
6787ad2435
fix: Improve imports for new rankers (#5696)
* Proper imports for new rankers

* Small fix
2023-08-31 13:33:29 +02:00
Vladimir Blagojevic
2118f68769
feat: Add domain scoping to WebRetriever (#5587)
* WebSearch: add allowed_domains scoped search

* Add talk to website example

* Add release note

* Add allowed_domains to WebSearch

* Minor fix

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-28 20:02:02 +02:00
Vladimir Blagojevic
da67700318
Rename web_lfqa_improved and update questions (#5588) 2023-08-17 17:10:49 +02:00
Vladimir Blagojevic
a75b9dd4bb
feat: LinkContentFetcher - add content-type resolution, user agent switching, PDF handler (#5374)
* Add content type resolution, pdf handler, user agent switching
---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-09 18:14:04 +02:00
Vladimir Blagojevic
abc6737e63
feat: Improve LFQA Web Example (#5504)
* Improve web_lfqa example

* Turn off pylint for logging setup

* Another way to turn off logging
2023-08-04 14:20:06 +02:00
Vladimir Blagojevic
1876c41f07
feat: Add LostInTheMiddleRanker (#5457)
* Add lost in the middle ranker

* Add release note

* Julian's feedback: more precise version of truncate

* Better comments for the litm algorithm

* Sebastian PR feedback

* Add check for invalid values of word_count_threshold

* Remove _truncate as it is not needed any more

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-08-02 17:05:13 +02:00
Vladimir Blagojevic
40a2e9b56a
refactor: Update WebRetriever to use LinkContentFetcher (#5229)
* Refactor WebRetriever to use LinkContentFetcher

* PR feedback

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-08-02 12:45:03 +02:00
Vladimir Blagojevic
540d0fad97
feat: Add DiversityRanker (#5398)
* Introduce DiversityRanker

* improve most_diverse_order speed

* Compute mean for numerical stability

* Add release note

* Add cosine similarity 

* Test both dot product and cosine similarity

* Add pydocs hook

---------

Co-authored-by: Michel Bartels <login@michelbartels.com>
2023-08-01 12:48:34 +02:00
Nicola Procopio
8a2ab82651
feat: Added hybrid search example (#5376)
* added hybrid search example

Added an example about hybrid search for faq pipeline on covid dataset

* formatted with back formatter

* renamed document

* fixed

* fixed typos

* added test

added test for hybrid search

* fixed withespaces

* removed test for hybrid search

* fixed pylint

* commented logging
2023-07-24 12:54:21 +02:00
Vladimir Blagojevic
597df1414c
feat: Update Anthropic Claude support with the latest models, new streaming API, context window sizes (#5406)
* Update Claude support with the latest models, new streaming API, context window sizes

* Use Github Anthropic SDK link for tokenizer, revert _init_tokenizer

* Change example key name to ANTHROPIC_API_KEY
2023-07-21 13:33:07 +02:00
Vladimir Blagojevic
f21005f8ea
refactor: Extract link retrieval from WebRetriever, introduce LinkContentRetriever (#5227)
* Extract link retrieval from WebRetriever, introduce LinkContentRetriever

* Add example
---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-07-13 12:54:40 +02:00
Bilge Yücel
6a1b6b1ae3
feat: Update ConversationalAgent (#5065)
* feat: Update ConversationalAgent

* Add Tools
* Add test
* Change default params

* fix tests

* Fix circular import error
* Update conversational-agent prompt
* Add conversational-agent-without-tools to legacy list

* Add warning to add tools to conversational agent

* Add callable tools

* Add example script

* Fix linter errors

* Update ConversationalAgent depending on the existance of tools

* Initialize the base Agent with different arguments when there's tool
* Inject memory to the prompt in both cases, update prompts accordingly

* Override the add_tools method to prevent adding tools to ConversationalAgent without tools

* Update test

* Fix linter error

* Remove unused import

* Update docstrings and api reference

* Fix imports and doc string code snippet

* docstrings update

* Update conversational.py

* Mock PromptNode

* Prevent circular import error

* Add max_steps to the ConversationalAgent

* Update resolver description

* Add prompt_template as parameter

* Change docstring

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-06-20 13:09:21 +03:00
Vladimir Blagojevic
8d8de65492
Add AgentToolLogger, unit test, and example usage (#5087) 2023-06-15 08:43:20 +02:00
ZanSara
9612aa90bb
fix examples (#5041) 2023-05-29 15:15:38 +02:00
Vladimir Blagojevic
9d52998b25
feat: Add conversational agent (#4931) 2023-05-17 15:19:09 +02:00
Vladimir Blagojevic
8091ced8d5
refactor: Extract ToolsManager, add it to Agent by composition (#4794)
* Extract ToolsManager, add it to Agent by the composition
* PR feedback Massi
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-05-03 16:45:40 +02:00
Vladimir Blagojevic
3fefc475b4
fix: Deprecate Seq2SeqGenerator and RAGenerator (#4745)
* Deprecate Seq2SeqGenerator

* changed the warning to include suggestion

* Added example and msg to API reference docs

* Added RAG deprecation

* renamed name to adapt to naming conven

* update docstrings

---------

Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-04-26 13:59:35 +02:00
Vladimir Blagojevic
be25655663
feat: Add agent tools (#4437)
* Initial commit, add search_engine

* Add TopPSampler

* Add more TopPSampler unit tests

* Remove SearchEngineSampler (converted to TopPSampler)

* Add some basic WebSearch unit tests

* Rename unit tests

* Add WebRetriever into agent_tools

* Adjust to WebRetriever

* Add WebRetriever mode [snippet|document]

* Minor changes

* SerperDev: add peopleAlsoAsk search results

* First agent for hotpotqa

* Making WebRetriever work on hotpotqa

* refactor: minor WebRetriever improvements (#4377)

* refactor: remove doc ids rebuild + antecipate cache

* refactor: improve caching, fix Document ids

* Minor WebRetriever improvements

* Overlooked minor fixes

* feat: add Bing API as search engine

* refactor: let kwargs pass-through

* feat: increase search context

* check sampler result, improve batch typing

* refactor: increase mypy compliance

* Initial commit, add search_engine

* Add TopPSampler

* Add more TopPSampler unit tests

* Remove SearchEngineSampler (converted to TopPSampler)

* Add some basic WebSearch unit tests

* Rename unit tests

* Add WebRetriever into agent_tools

* Adjust to WebRetriever

* Add WebRetriever mode [snippet|document]

* Minor changes

* SerperDev: add peopleAlsoAsk search results

* First agent for hotpotqa

* Making WebRetriever work on hotpotqa

* refactor: minor WebRetriever improvements (#4377)

* refactor: remove doc ids rebuild + antecipate cache

* refactor: improve caching, fix Document ids

* Minor WebRetriever improvements

* Overlooked minor fixes

* feat: add Bing API as search engine

* refactor: let kwargs pass-through

* feat: increase search context

* check sampler result, improve batch typing

* refactor: increase mypy compliance

* Fix mypy

* Minor example fixes

* Fix the descriptions

* PR feedback updates

* More fixes

* TopPSampler: handle top p None value, add unit test

* Add top_k to WebSearch

* Use boilerpy3 instead trafilatura

* Remove date finding

* Add more WebRetriever docs

* Refactor long methods

* making the preprocessor optional

* hide WebSearch and make NeuralWebSearch a pipeline

* remove unused imports

* add WebQAPipeline and split example into two

* change example search engine to SerperDev

* Turn off progress bars in WebRetriever's PreProcesssor

* Agent tool examples - final updates

* Add webqa test, search results ranking scores

* Better answer box handling for SerperDev and SerpAPI

* Minor fixes

* pylint

* pylint fixes

* extract TopPSampler from WebRetriever

* use sampler only for WebRetriever modes other than snippet

* add web retriever tests

* add web retriever tests

* exclude rdflib@6.3.2 due to license issues

* add test for preprocessed docs and kwargs examples in docstrings

* Move test_webqa_pipeline to test/pipelines

* change docstring for join_documents_and_scores

* Use WebQAPipeline in examples/web_lfqa.py

* Use WebQAPipeline in examples/web_lfqa.py

* Move test_webqa_pipeline to e2e

* Updated lg

* Sampler added automatically in WebQAPipeline, no need to add it

* Updated lg

* Updated lg

* :ignore Update agent tools examples to new templates (#4503)

* Update examples to new templates

* Add print back

* fix linting and black format issues

---------

Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-03-27 18:14:58 +02:00
Massimiliano Pippi
5e0de4a9ed
do not run launch_es in the CI (#3981) 2023-01-27 16:43:17 +01:00
Tuana Celik
e1502c8029
Adding Example Scripts to Haystack (#3588)
* add 2 example scripts

* fixing faq script

* updating PR based on comments

* black

* updating s3 buckets

* first attempt at testing

* Add basic tests to two scripts

PR: #3588

* make tests runnable

* reformat files

* only run in PRs touching an example

Co-authored-by: bilgeyucel <bilgeyucel96@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-27 14:54:59 +01:00