3803 Commits

Author SHA1 Message Date
dependabot[bot]
55a2e7ab7f
build(deps): bump readmeio/rdme from 8.3.1 to 8.6.6 (#5789)
Bumps [readmeio/rdme](https://github.com/readmeio/rdme) from 8.3.1 to 8.6.6.
- [Release notes](https://github.com/readmeio/rdme/releases)
- [Changelog](https://github.com/readmeio/rdme/blob/next/CHANGELOG.md)
- [Commits](https://github.com/readmeio/rdme/compare/8.3.1...8.6.6)

---
updated-dependencies:
- dependency-name: readmeio/rdme
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-13 11:03:12 +02:00
Silvano Cerza
7e544d4f60
Fix license compliance workflow (#5791)
* Formatting

* Try to send event to Datadog only if possible
2023-09-13 10:43:06 +02:00
dependabot[bot]
e688d3dddb
build(deps): bump aws-actions/configure-aws-credentials (#5790)
Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 2.2.0 to 4.0.0.
- [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases)
- [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/main/CHANGELOG.md)
- [Commits](5fd3084fc3...8c3f20df09)

---
updated-dependencies:
- dependency-name: aws-actions/configure-aws-credentials
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-13 10:25:54 +02:00
Massimiliano Pippi
de6c57e20b
let dependabot update github actions (#5788) 2023-09-13 10:23:30 +02:00
ZanSara
2c4d839b64
feat: GPT4Generator (#5744)
* add gpt4generator

* add e2e

* add tests

* reno

* fix e2e

* Update test/preview/components/generators/openai/test_gpt4_generator.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-09-13 10:07:09 +02:00
Christian Clauss
75dc60b0bb
ci: Upgrade GitHub Actions (#5787) 2023-09-13 09:58:47 +02:00
ZanSara
94c5d6d216
feat: make GPT35Generator non batch (#5764)
* make gpt35generator not batch

* fix tests

* review feedback

* mypy
2023-09-12 18:19:28 +02:00
Christian Clauss
6846448bac
pylint: Set limits on code complexity (#5771) 2023-09-12 18:13:23 +02:00
ZanSara
24c42b1e03
fix tests (#5773) 2023-09-12 17:41:08 +02:00
ZanSara
7194343458
remove test (#5753) 2023-09-12 16:04:36 +02:00
ZanSara
869f69d0d1
fix: temporary pin tiktoken (#5774)
* exclude breaking tiktoken version

* exclude breaking tiktoken version
2023-09-12 14:35:52 +02:00
Christian Clauss
23f7308bec
ci: pre-commit autoupdate (#5777) 2023-09-12 14:34:41 +02:00
Christian Clauss
45cc40bf51
linting.yml: Upgrade GitHub Actions (#5752) 2023-09-11 20:49:20 +02:00
ZanSara
6e70d403f8
feat: Improve Document for Haystack 2.0 (#5738)
* initial draft

* tests

* add proposal

* proposal number

* reno

* fix tests and usage of content and content_type

* update branch & fix more tests

* mypy

* add docstring

* fix more tests

* review feedback

* improve __str__

* Apply suggestions from code review

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/preview/dataclasses/document.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* improve __str__

* fix tests

* fix more tests

* Update haystack/preview/document_stores/memory/document_store.py

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-11 17:40:00 +02:00
Stefano Fiorucci
2edf85f739
MemoryEmbeddingRetriever (2.0) (#5726)
* MemoryDocumentStore - Embedding retrieval draft

* add release notes

* fix mypy

* better comment

* improve return_embeddings handling

* MemoryEmbeddingRetriever - first draft

* address PR comments

* release note

* update docstrings

* update docstrings

* incorporated feeback

* add return_embedding to __init__

* rm leftover docstring

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-08 15:52:48 +02:00
Stefano Fiorucci
d860a5c604
make tests more robust (#5747) 2023-09-08 15:50:56 +02:00
Tuana Çelik
b5987a6d8d
Update web.py (#5742)
Fixing the api docs for webretriever.
2023-09-08 09:06:14 +02:00
Stefano Fiorucci
b7bea3ae9c
MemoryDocumentStore - Embedding retrieval (2.0) (#5715)
* MemoryDocumentStore - Embedding retrieval draft

* add release notes

* fix mypy

* better comment

* improve return_embeddings handling

* address PR comments

* update docstrings

* incorporated feeback

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-07 15:44:07 +02:00
bogdankostic
71852c7b06
Fix output of AnswerBuilder (#5737) 2023-09-07 12:54:24 +02:00
ZanSara
7abd73419f
fix remote whisper tests (#5732) 2023-09-07 10:53:29 +02:00
bogdankostic
42b6954aa5
docs: Remove mention of hosted annotation tool (#5735) 2023-09-07 10:40:31 +02:00
ZanSara
63cbde7287
feat: GPT35Generator (#5714)
* chatgpt backend

* fix tests

* reno

* remove print

* helpers tests

* add chatgpt generator

* use openai sdk

* remove backend

* tests are broken

* fix tests

* stray param

* move _check_troncated_answers into the class

* wrong import

* rename function

* typo in test

* add openai deps

* mypy

* improve system prompt docstring

* typos update

* Update haystack/preview/components/generators/openai/chatgpt.py

* pylint

* Update haystack/preview/components/generators/openai/chatgpt.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/preview/components/generators/openai/chatgpt.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/preview/components/generators/openai/chatgpt.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* review feedback

* fix tests

* freview feedback

* reno

* remove tenacity mock

* gpt35generator

* fix naming

* remove stray references to chatgpt

* fix e2e

* Update releasenotes/notes/chatgpt-llm-generator-d043532654efe684.yaml

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* add another test

* test wrong model name

* review feedback

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-09-07 10:06:57 +02:00
Vladimir Blagojevic
c5edb45c10
feat: Add SerperDevWebSearch Haystack 2.0 component (#5712)
* Add SerperDev

* Add release note

* PR Feedback

* Simplify, remove one-liner

* Update haystack/preview/components/websearch/serper_dev.py

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* Update haystack/preview/components/websearch/serper_dev.py

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* Fix formatting

* PR feedback

* Fix tests

* Function rename

* Remove scoring, update tests

* PR feedback

* Fix return

* small adjustments

* fix tests

* add e2e test

* fix release notes

* fix tests

* fix e2e

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-09-06 17:31:42 +02:00
ZanSara
0bbc219a59
chore: enable e2e preview tests (#5730)
* enable e2e preview tests

* fix transcriber test

* quotes

* add missing dep

* missing comma

* ffmpeg
2023-09-06 16:48:45 +02:00
Timo Moeller
d048bb5352
docs: Add minimal getting started code to showcase haystack + RAG (#5578)
* init

* Change question

* Add TODO comment

* Addressing feedback

* Add local folder option. Move additional functions inside haystack.utils for easier imports

* Apply Daria's review suggestions

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Add integration test

* change string formatting

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Add outputparser to HF

* Exclude anthropic test

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-09-06 12:14:08 +02:00
ZanSara
10d6886255
chore: move PromptBuilder in builders (#5729) 2023-09-06 11:52:21 +02:00
Timo Moeller
d540883469
Add api keys to CI workflows (#5722) 2023-09-05 16:21:17 +02:00
Agnieszka Marzec
5d2a7534a0
Correct the number of tokens (#5548)
As per https://discord.com/channels/954421988141711382/1136952298740920341/1138936382467866694
2023-09-05 15:07:45 +02:00
bogdankostic
639f7cf888
chore: Rename AnswersBuilder to AnswerBuilder (#5720)
* Add AnswersBuilder

* Add tests for AnswersBuilder

* Add release note

* PR feedback

* Fix mypy

* Remove redundant check for number of groups

* Rename AnswersBuilder to AnswerBuilder

* Update test/preview/components/builders/test_answer_builder.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Rename reno file

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-05 14:34:22 +02:00
Silvano Cerza
2acc41ea85
Add PromptBuilder (#5713)
* Add PromptBuilder

* Update release note

* Add test
2023-09-05 12:22:21 +02:00
bogdankostic
a5b815690e
feat: Add AnswersBuilder component (2.0) (#5701)
* Add AnswersBuilder

* Add tests for AnswersBuilder

* Add release note

* PR feedback

* Fix mypy

* Remove redundant check for number of groups

* docstrings upd

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-04 21:16:20 +02:00
ZanSara
c5369a39ef
upgrae canals (#5708) 2023-09-04 14:55:05 +02:00
ZanSara
7886284d4e
chore: fix mypy failure (#5707)
* mypy

* add comment on type ignore
2023-09-04 12:08:59 +02:00
Massimiliano Pippi
24b8cfb1c7
Update 3558-embedding_retriever.md (#5705) 2023-09-04 11:28:51 +02:00
bogdankostic
11440395f4
fix: Set model_max_length in the Tokenizer of DefaultPromptHandler (#5596)
* Set model_max_length in tokenizer in prompt handler

* Add release note
2023-09-01 11:48:41 +02:00
bogdankostic
67da275ae0
Rename question to query in Answer dataclass (2.0) (#5699) 2023-09-01 10:37:56 +02:00
ZanSara
5f1256ac7e
feat: generators (2.0) (#5690)
* add generators module

* add tests for module helper

* reno

* add another test

* move into openai

* improve tests
2023-08-31 17:33:12 +02:00
Vladimir Blagojevic
6787ad2435
fix: Improve imports for new rankers (#5696)
* Proper imports for new rankers

* Small fix
2023-08-31 13:33:29 +02:00
Alexander
55b10a3868
Update squad_to_dpr.py (#5689)
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-08-30 20:39:14 +03:00
Tuana Çelik
1a872a7841
update description for pypi (#5687) 2023-08-30 15:29:12 +02:00
github-actions[bot]
88318bfdb5
Bump unstable version (#5686)
* Update unstable version

* Bump to gooooo

---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-08-30 15:27:50 +02:00
ZanSara
ce06268990
test: fix e2e test failures (#5685)
* fix test errors

* fix pipeline yaml

* disable cache

* fix errors

* remove stray fixture
2023-08-30 12:24:03 +02:00
ZanSara
1709be162c
auto trigger e2e workflow on PRs that affect it (#5684) 2023-08-30 10:25:47 +02:00
Fanli Lin
40d9f34e68
feat: enable passing use_fast to the underlying transformers' pipeline (#5655)
* copy instead of deepcopy

* fix pylint

* add use_fast

* add release note

* remove unrelevant changes

* black fix

* fix bug

* black

* bug fix
2023-08-30 10:25:18 +02:00
ZanSara
b1daa7c647
chore: migrate to canals==0.7.0 (#5647)
* add default_to_dict and default_from_dict placeholders to ease migration to canals 0.7.0

* canals==0.7.0

* whisper components

* add to_dict/from_dict stubs

* import serialization methods in init to hide canals imports

* reno

* export deserializationerror too

* Update haystack/preview/__init__.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* serialization methods for LocalWhisperTranscriber (#5648)

* chore: serialization methods for `FileExtensionClassifier` (#5651)

* serialization methods for FileExtensionClassifier

* Update test_file_classifier.py

* chore: serialization methods for `SentenceTransformersDocumentEmbedder` (#5652)

* serialization methods for SentenceTransformersDocumentEmbedder

* fix device management

* serialization methods for SentenceTransformersTextEmbedder (#5653)

* serialization methods for TextFileToDocument (#5654)

* chore: serialization methods for `RemoteWhisperTranscriber` (#5650)

* serialization methods for RemoteWhisperTranscriber

* remove patches

* Add default to_dict and from_dict in document stores built with factory (#5674)

* fix tests (#5671)

* chore: simplify serialization methods for `MemoryDocumentStore` (#5667)

* simplify serialization for MemoryDocumentStore

* remove redundant tests

* pylint

* chore: serialization methods for `MemoryRetriever` (#5663)

* serialization method for MemoryRetriever

* more tests

* remove hash from default_document_store_to_dict

* remove diff in factory.py

* chore: serialization methods for `DocumentWriter` (#5661)

* serialization methods for DocumentWriter

* more tests

* use factory

* black

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-29 18:15:07 +02:00
Silvano Cerza
a613b1b7f5 Format crawler.py 2023-08-29 17:54:30 +02:00
Vladimir Blagojevic
a9b8fd9658
Move WebRetriever's new init parameter to last parameter position (#5673) 2023-08-29 17:46:12 +02:00
Daria Fokina
fbc1951e74
Update crawler.py (#5610) 2023-08-29 16:46:19 +02:00
Vladimir Blagojevic
e5e7bb9654
feat: Allow WebRetrieve to use custom LinkContentFetcher (#5662)
* Allow use of custom LinkContentFetcher

* Add release note
2023-08-29 15:46:48 +02:00
bogdankostic
07c85905f3
fix: Change use_auth_token to token in TransformersQueryClassifier (#5659) 2023-08-29 15:21:25 +02:00