3174 Commits

Author SHA1 Message Date
Silvano Cerza
a476486d34
chore: Fix mypy errors when running preview linting in CI (#6073)
* Fix mypy errors when running preview linting in CI

* Trigger CI

* Revert "Trigger CI"

This reverts commit 9b47d19279eaa4e020c645ed1c18c8263acd7695.

* Revert "Fix mypy errors when running preview linting in CI"

This reverts commit 78b5d92ad8085c9b61848ecf6de242bea67f3281.

* Ignore mypy errrors

* Trigger CI

* Revert "Trigger CI"

This reverts commit 62050ec0fd057b2efb2f7f0a13da42b0eeabb6b8.
2023-10-16 15:00:58 +02:00
Silvano Cerza
c78e1a7eb3
Add a workflow to verify haystack.preview doesn't import non preview modules (#6053) 2023-10-16 09:36:45 +02:00
Nicola Procopio
32e87d37c1
fixed join_docs.py concatenate (#5970)
* added hybrid search example

Added an example about hybrid search for faq pipeline on covid dataset

* formatted with back formatter

* renamed document

* fixed

* fixed typos

* added test

added test for hybrid search

* fixed withespaces

* removed test for hybrid search

* fixed pylint

* commented logging

* fixed bug in join_docs.py _concatenate_results

* Update join_docs.py

updated comment

* format with black

* added releasenote on PR

* updated release notes

* updated test_join_documents

* updated test

* updated test

* Update test_join_documents.py

* formatted with black

* fixed test

* fixed

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-10-16 09:31:52 +02:00
Silvano Cerza
92ae169bdf
Proposal: Document Stores filter specification for Haystack 2.x (#6001)
* Filters rework proposal

* Update proposal with received feedback
2023-10-16 09:26:23 +02:00
Julian Risch
aaee03aee8
feat: Add DocumentCleaner 2.0 (#5976)
* remove whitespaces, substrings, regex, empty lines

* remove repeated substrings

* reno

* return empty string as shortest common ngram

* address first half of review feedback

* address second half of review feedback

* mention \f page separator for header/footer removal

* mention \f page separator for header/footer removal

* mark example usage as python code
2023-10-13 12:39:55 +02:00
Bilge Yücel
ad25041618
Remove old Cohere models and add aliases for existing ones (#6007)
* Remove old cohere models

* Add aliases for the existing models according to Cohere documentation

* Add release note

* put cohere embdding models in a constant
* update doc strings

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-10-13 12:08:26 +02:00
Stefano Fiorucci
fbd22bc1e9
feat: HuggingFaceLocalGenerator - first implementation (#6022)
* draft

* still a raw draft

* still a raw draft

* improvements

* minimal impl ok

* tests

* reno

* better language

* examples of generation_kwargs

* incorporate feedback

* lg and format updates

* don't save valid str tokens

* fix style

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-10-13 11:23:56 +02:00
Daria Fokina
41fd0c5458
docs: adding missing docstrings for run and run_batch methods (#5609)
* docstrings for run methods

* updates from pr review

* wrong article

* fix style

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-10-13 11:23:26 +02:00
Julian Risch
b507f1a124
feat: Add TextLanguageClassifier 2.0 (#6026)
* draft TextLanguageClassifier

* implement language detection with langdetect

* add unit test for logging message

* reno

* pylint

* change input from List[str] to str

* remove empty output connections

* add from_dict/to_dict tests

* mark example usage as python code
2023-10-13 10:30:49 +02:00
ZanSara
110aacdc35
feat: add basic telemetry to pipelines 2.0 (#5929)
* add telemetry to pipelines 2.0

* only collect data if telemetry is on

* reno

* add downsampling

* typing

* manual tests

* pylint

* simplify code

* Update haystack/preview/telemetry/__init__.py

* rather index by component type

* black

* mypy

* review feedback & small improvements

* defaultdict

* stray changes

* lint

* invert condition

* always send the first event of the day

* collect specs

* track 2nd and 3rd events too

* send first event and then max 1 event a minute

* rename constant

* invert condition

* linting
2023-10-13 09:31:51 +02:00
Akash Goyal
988fa61f84
Addition to the text in ValueError when creating a prompt node to inf… (#6000)
* Addition to the text in ValueError when creating a prompt node to inform users to double check they have authorisation for the loaded model and have logged into the huggingface cli

* Update haystack/nodes/prompt/prompt_model.py

Accepted the suggested changes to the value error text

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-10-13 09:05:21 +02:00
Julian Risch
59e89b1031
test: Remove anthropic from "getting started" example test (#6024) 2023-10-12 22:36:49 +02:00
ZanSara
adf7e49af3
chore: review all extra (#6029) 2023-10-12 21:50:53 +02:00
Stefano Fiorucci
2c2549f13d
move embedding backends (#6033) 2023-10-12 17:52:28 +02:00
Vladimir Blagojevic
d51be9edac
Add top_k to SimilarityRanker (#6036) 2023-10-12 13:52:01 +02:00
Vladimir Blagojevic
4b8b6e9191
Use forward reference for AnalyzeResult (#6030) 2023-10-11 16:33:02 +02:00
Vladimir Blagojevic
3803d23ff6
feat: Update PyPDFToDocument to process ByteStream inputs (#6021)
* Update PyPDF converter

* Add mixed source unit test

* Update haystack/preview/components/file_converters/pypdf.py

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-10-11 10:52:08 +02:00
Vladimir Blagojevic
1a6a8863e8
feat: Update HTMLToDocument to handle ByteStream inputs (#6020)
* Update HTML converter

* Add mixed source unit test

* Update haystack/preview/components/file_converters/html.py

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-10-11 10:15:58 +02:00
Julian Risch
12fe0364dc
test: Utility to compare two lists of documents for equality (#6005)
* check that sorted lists contain same docs

* fix broken tests
2023-10-11 08:16:41 +02:00
Vladimir Blagojevic
6a50123b9f
feat: Adjust LinkContentFetcher run method, use ByteStream (#5972) 2023-10-10 17:48:31 +02:00
Nicola Procopio
c102b152dc
fix: Run update_embeddings in examples (#6008)
* added hybrid search example

Added an example about hybrid search for faq pipeline on covid dataset

* formatted with back formatter

* renamed document

* fixed

* fixed typos

* added test

added test for hybrid search

* fixed withespaces

* removed test for hybrid search

* fixed pylint

* commented logging

* updated hybrid search example

* release notes

* Update hybrid_search_faq_pipeline.py-815df846dca7e872.yaml

* Update hybrid_search_faq_pipeline.py

* mention hybrid search example in release notes

* reduce installed dependencies in examples test workflow

* do not install cuda dependencies

* skip models if API key not set; delete document indices

* skip models if API key not set; delete document indices

* skip models if API key not set; delete document indices

* keep roberta-base model and inference extra

* pylint

* disable pylint no-logging-basicconfig rule

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-10-10 16:38:52 +02:00
Vladimir Blagojevic
c05f564359
feat: Split linting preview into a separate file (#6017)
* Split linting preview into seperate file

* Add not trigger paths in old workflow
2023-10-10 14:54:27 +02:00
Vladimir Blagojevic
98215aec0d
feat: Rename FileExtensionRouter to FileTypeRouter, handle ByteStream(s) (#5998)
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-10-10 09:14:04 +02:00
DanShatford
07048791aa
feat: allow list of file paths in convert_files_to_docs (#5961)
* feat: allow list of file paths in `convert_files_to_docs`

* Fix validation

* Fix check errors
2023-10-09 20:19:03 +02:00
David Berenstein
13fb7c5b5f
feat: added on_agent_final_answer-support to Agent callback_manager (#5736)
* chore: added on_agent_final_answer-support to Agent callback_manager

* chore: format black

* run pre-commit to format file

* updated release notes

* reverted sorted imports

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-10-09 18:03:47 +02:00
Daria Fokina
d0ff3fa7c2
docs: readme-get-started (#5993)
* readme-get-started

* lg update

* lg update

Co-authored-by: Bilge Yücel <bilgeyucel96@gmail.com>

---------

Co-authored-by: Bilge Yücel <bilgeyucel96@gmail.com>
2023-10-09 15:24:47 +02:00
Timo Moeller
aea6333637
Add end2end tests as getting started to HS2.0 readme (#5981)
* Add end2end tests as getting started to HS2.0 readme

* capital heading

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-10-09 15:03:24 +02:00
ZanSara
71f2430fd1
test: enhance e2e tests to also draw and serialize/deserialize the test pipelines (#5910)
* add draw and serialization/deserialization to e2e pipeline examples

* add comment about json serialization

* fix a small gptgenerator bug and move indexing in tests

* to json

* review feedback
2023-10-09 13:54:17 +02:00
Vladimir Blagojevic
40b83d8a47
feat: Add TopPSampler Haystack 2.0 component (#5924) 2023-10-09 13:44:01 +02:00
Silvano Cerza
0cb9abb1c2
Rename proposal to respect specifications (#6002) 2023-10-09 11:24:19 +02:00
Greg
0e9a51cfb1
fix: annotation-tool is missing DOMAIN_WHITELIST envvar (#5997)
The docker-compose.yml file for annotation tool
('annotation_tool/domain-compose.yml') is giving an error with latest
image.

The annotation-tool demands the `DOMAIN_WHITELIST` envvar to be defined
that is not a part of the given template referenced by the
documentation.

Signed-off-by: Greg Nagy <greg.nagy@deepset.ai>
2023-10-08 19:48:30 +02:00
Stefano Fiorucci
4e921c650e
rm useless pin (#5995) 2023-10-06 18:26:08 +02:00
Vladimir Blagojevic
1cdff6427e
feat: Add SimilarityRanker to Haystack 2.0 (#5923)
* Initial SimilarityRanker
2023-10-06 16:01:34 +02:00
Stefano Fiorucci
ccc9f010bb
fix: fix ChatGPT invocation layer (and add async support) (#5979)
* ChatGPT async

* release note

* fix tests
2023-10-05 18:43:26 +02:00
Vladimir Blagojevic
282419d82b
feat: Unfreeze Document in Haystack 2.0 (#5974)
* Unfreeze document

* Remove immutability test
2023-10-05 17:55:07 +02:00
Vladimir Blagojevic
f983e605c7
Revert "ci: added isort to pyproject.toml and pre-commit (#5933)" (#5980)
This reverts commit 64243540fb1f2cb6d4dfbb5b12db3aaf59a21b4a.
2023-10-05 17:45:28 +02:00
Tobias Wochinger
d5d3a9eef4
chore: adapt deepset cloud sdk endpoint format for saving pipelines (#5969)
* chore: adapt to new endpoints formats

* docs: add release notes
2023-10-05 08:56:28 +02:00
Massimiliano Pippi
c2ec3f5fde
feat: add File type to preview package (#5873)
* add Blob type

* review feedback

* fix tests and naming

* Update add-blob-type-2a9476a39841f54d.yaml

* removed unused import

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-10-04 17:23:12 +02:00
dependabot[bot]
a4beec3013
build(deps): bump aws-actions/configure-aws-credentials (#5968)
Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 4.0.0 to 4.0.1.
- [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases)
- [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/main/CHANGELOG.md)
- [Commits](8c3f20df09...010d0da01d)

---
updated-dependencies:
- dependency-name: aws-actions/configure-aws-credentials
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-04 17:20:17 +02:00
Mike Lay
de4592f6f1
Fix contributing link (#5960)
* Fix broken contributor link. As this ultimately links to the actual contributing page, simply redirect to *CONTRIBUTING.md* instead of `#-contributing`

The 💙 emoji in the anchor does not actually resolve. Should be `#contributing`.
2023-10-04 12:01:51 +02:00
Matt Speck
64243540fb
ci: added isort to pyproject.toml and pre-commit (#5933) 2023-10-04 01:01:26 +02:00
dependabot[bot]
58192d35f1
build(deps): bump iterative/setup-cml from 1 to 2 (#5911)
Bumps [iterative/setup-cml](https://github.com/iterative/setup-cml) from 1 to 2.
- [Release notes](https://github.com/iterative/setup-cml/releases)
- [Commits](https://github.com/iterative/setup-cml/compare/v1...v2)

---
updated-dependencies:
- dependency-name: iterative/setup-cml
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-03 17:39:22 +02:00
ZanSara
b844ab8e22
chore: remove matrix from Linux CI (#5955)
* remove matrix

* workflow names
2023-10-03 17:39:04 +02:00
Stefano Fiorucci
cc70b4b613
deprecation (#5954) 2023-10-03 12:48:06 +02:00
Massimiliano Pippi
ac408134f4
feat: add support for async openai calls (#5946)
* add support for async openai calls

* add actual async call

* split the async api

* ask permission

* Update haystack/utils/openai_utils.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Fix OpenAI content moderation tests

* Fix ChatGPT invocation layer tests

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-10-03 10:42:21 +02:00
Lavesh Akhadkar
1ccf674d73
feat: DocumentWriter returns number of documents written (#5939)
* Make DocumentWriter return the number of documents it wrote

* Fixed return type
2023-10-03 10:02:33 +02:00
Timo Moeller
dfd9870bcd
Remove language validation (#5948) 2023-10-03 09:37:07 +02:00
Silvano Cerza
a933a42749 Fix release_notes.yml syntax 2023-10-02 13:24:08 -07:00
Zubeen
b8c3b68141
Update release_notes.yml (#5949)
Ignoring release notes check for PRs of type doc/ci/test
2023-10-02 22:17:55 +02:00
dependabot[bot]
69232612d0
build(deps): bump actions/checkout from 3 to 4 (#5928)
Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-01 12:38:57 +02:00