3803 Commits

Author SHA1 Message Date
Vladimir Blagojevic
c05f564359
feat: Split linting preview into a separate file (#6017)
* Split linting preview into seperate file

* Add not trigger paths in old workflow
2023-10-10 14:54:27 +02:00
Vladimir Blagojevic
98215aec0d
feat: Rename FileExtensionRouter to FileTypeRouter, handle ByteStream(s) (#5998)
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-10-10 09:14:04 +02:00
DanShatford
07048791aa
feat: allow list of file paths in convert_files_to_docs (#5961)
* feat: allow list of file paths in `convert_files_to_docs`

* Fix validation

* Fix check errors
2023-10-09 20:19:03 +02:00
David Berenstein
13fb7c5b5f
feat: added on_agent_final_answer-support to Agent callback_manager (#5736)
* chore: added on_agent_final_answer-support to Agent callback_manager

* chore: format black

* run pre-commit to format file

* updated release notes

* reverted sorted imports

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-10-09 18:03:47 +02:00
Daria Fokina
d0ff3fa7c2
docs: readme-get-started (#5993)
* readme-get-started

* lg update

* lg update

Co-authored-by: Bilge Yücel <bilgeyucel96@gmail.com>

---------

Co-authored-by: Bilge Yücel <bilgeyucel96@gmail.com>
2023-10-09 15:24:47 +02:00
Timo Moeller
aea6333637
Add end2end tests as getting started to HS2.0 readme (#5981)
* Add end2end tests as getting started to HS2.0 readme

* capital heading

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-10-09 15:03:24 +02:00
ZanSara
71f2430fd1
test: enhance e2e tests to also draw and serialize/deserialize the test pipelines (#5910)
* add draw and serialization/deserialization to e2e pipeline examples

* add comment about json serialization

* fix a small gptgenerator bug and move indexing in tests

* to json

* review feedback
2023-10-09 13:54:17 +02:00
Vladimir Blagojevic
40b83d8a47
feat: Add TopPSampler Haystack 2.0 component (#5924) 2023-10-09 13:44:01 +02:00
Silvano Cerza
0cb9abb1c2
Rename proposal to respect specifications (#6002) 2023-10-09 11:24:19 +02:00
Greg
0e9a51cfb1
fix: annotation-tool is missing DOMAIN_WHITELIST envvar (#5997)
The docker-compose.yml file for annotation tool
('annotation_tool/domain-compose.yml') is giving an error with latest
image.

The annotation-tool demands the `DOMAIN_WHITELIST` envvar to be defined
that is not a part of the given template referenced by the
documentation.

Signed-off-by: Greg Nagy <greg.nagy@deepset.ai>
2023-10-08 19:48:30 +02:00
Stefano Fiorucci
4e921c650e
rm useless pin (#5995) 2023-10-06 18:26:08 +02:00
Vladimir Blagojevic
1cdff6427e
feat: Add SimilarityRanker to Haystack 2.0 (#5923)
* Initial SimilarityRanker
2023-10-06 16:01:34 +02:00
Stefano Fiorucci
ccc9f010bb
fix: fix ChatGPT invocation layer (and add async support) (#5979)
* ChatGPT async

* release note

* fix tests
2023-10-05 18:43:26 +02:00
Vladimir Blagojevic
282419d82b
feat: Unfreeze Document in Haystack 2.0 (#5974)
* Unfreeze document

* Remove immutability test
2023-10-05 17:55:07 +02:00
Vladimir Blagojevic
f983e605c7
Revert "ci: added isort to pyproject.toml and pre-commit (#5933)" (#5980)
This reverts commit 64243540fb1f2cb6d4dfbb5b12db3aaf59a21b4a.
2023-10-05 17:45:28 +02:00
Tobias Wochinger
d5d3a9eef4
chore: adapt deepset cloud sdk endpoint format for saving pipelines (#5969)
* chore: adapt to new endpoints formats

* docs: add release notes
2023-10-05 08:56:28 +02:00
Massimiliano Pippi
c2ec3f5fde
feat: add File type to preview package (#5873)
* add Blob type

* review feedback

* fix tests and naming

* Update add-blob-type-2a9476a39841f54d.yaml

* removed unused import

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-10-04 17:23:12 +02:00
dependabot[bot]
a4beec3013
build(deps): bump aws-actions/configure-aws-credentials (#5968)
Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 4.0.0 to 4.0.1.
- [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases)
- [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/main/CHANGELOG.md)
- [Commits](8c3f20df09...010d0da01d)

---
updated-dependencies:
- dependency-name: aws-actions/configure-aws-credentials
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-04 17:20:17 +02:00
Mike Lay
de4592f6f1
Fix contributing link (#5960)
* Fix broken contributor link. As this ultimately links to the actual contributing page, simply redirect to *CONTRIBUTING.md* instead of `#-contributing`

The 💙 emoji in the anchor does not actually resolve. Should be `#contributing`.
2023-10-04 12:01:51 +02:00
Matt Speck
64243540fb
ci: added isort to pyproject.toml and pre-commit (#5933) 2023-10-04 01:01:26 +02:00
dependabot[bot]
58192d35f1
build(deps): bump iterative/setup-cml from 1 to 2 (#5911)
Bumps [iterative/setup-cml](https://github.com/iterative/setup-cml) from 1 to 2.
- [Release notes](https://github.com/iterative/setup-cml/releases)
- [Commits](https://github.com/iterative/setup-cml/compare/v1...v2)

---
updated-dependencies:
- dependency-name: iterative/setup-cml
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-03 17:39:22 +02:00
ZanSara
b844ab8e22
chore: remove matrix from Linux CI (#5955)
* remove matrix

* workflow names
2023-10-03 17:39:04 +02:00
Stefano Fiorucci
cc70b4b613
deprecation (#5954) 2023-10-03 12:48:06 +02:00
Massimiliano Pippi
ac408134f4
feat: add support for async openai calls (#5946)
* add support for async openai calls

* add actual async call

* split the async api

* ask permission

* Update haystack/utils/openai_utils.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Fix OpenAI content moderation tests

* Fix ChatGPT invocation layer tests

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-10-03 10:42:21 +02:00
Lavesh Akhadkar
1ccf674d73
feat: DocumentWriter returns number of documents written (#5939)
* Make DocumentWriter return the number of documents it wrote

* Fixed return type
2023-10-03 10:02:33 +02:00
Timo Moeller
dfd9870bcd
Remove language validation (#5948) 2023-10-03 09:37:07 +02:00
Silvano Cerza
a933a42749 Fix release_notes.yml syntax 2023-10-02 13:24:08 -07:00
Zubeen
b8c3b68141
Update release_notes.yml (#5949)
Ignoring release notes check for PRs of type doc/ci/test
2023-10-02 22:17:55 +02:00
dependabot[bot]
69232612d0
build(deps): bump actions/checkout from 3 to 4 (#5928)
Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-01 12:38:57 +02:00
ZanSara
81b2e83d04
feat: separate out preview tests (#5639)
* add preview workflows

* feedback

* feedback

* use preview extra

* remove coverage and add separate e2e

* rename workflow file for consistency

* trigger ci

* undo trigger

* torch import in testing

* add deps to unit tests

* feedback

* run container instead of service

* comment

* add if statement

* fix tika version

* separate out win integration tests

* separate out all CIs

* try installing docker on macos

* exclude tika

* remove tika docker
2023-09-29 13:16:08 +02:00
bogdankostic
d61df24b27
chore: Remove classifiers directory from preview package (#5918) 2023-09-29 10:38:33 +02:00
Massimiliano Pippi
0947f59545
feat: add async PromptNode run (#5890)
* add async promptnode

* Remove unecessary calls to dict.keys()

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-09-29 08:40:01 +02:00
ZanSara
578f2b4bbf
feat: update canals to 0.8.1 (#5900)
* Update canals to 0.8.1

* scale up runner
2023-09-28 17:50:46 +02:00
Vladimir Blagojevic
e882a7d5c8
feat: Add HTMLToDocument component (v2) (#5907) 2023-09-28 17:22:28 +02:00
Massimiliano Pippi
dfa48eece9
clean up the Slack integrations (#5908) 2023-09-28 15:49:19 +02:00
Stefano Fiorucci
d4aacad5f9
feat: OpenAIDocumentEmbedder (#5822)
* first draft

* release note

* mypy fix

* fix test

* corrections

* pr feedback

* better secrets handling and new tests

* missing imports in embedders/__init__.py

* better format condition

* address feedback
2023-09-28 15:42:51 +02:00
ZanSara
83724b74e3
feat: Make metadata optional in AnswerBuilder (#5909)
* optional metadata

* improve docstring
2023-09-28 14:42:19 +02:00
Stefano Fiorucci
9340c572f9
alternative skipif conditions in azure ocr converter test (#5906) 2023-09-28 12:09:19 +02:00
Silvano Cerza
35ec8cc8fb
Rework evaluation and metrics calculation for Haystack 2.x (#5794)
* draft requirements from discussion

* Add some more information

* Update proposal given new feedback

* More drawbacks

* Decision drivers

* Nitpick

* Summary

* PR number

* Mark code snippets

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* Link correct issue

* Add missing word

* More context on blind evaluation

* Rephrase confusing sentence

* Add a more detailed code example

* Ignore mypy and pylint in example file

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-09-28 00:51:51 +02:00
Julian Risch
4413675e64
feat: Add TextDocumentSplitter that splits by word, sentence, passage (2.0) (#5870)
* draft split by word, sentence, passage

* naive way to split sentences without nltk

* reno

* add tests

* make input list of docs, review feedback

* add source_id and more validation

* update docstrings

* add split delimiters back to strings

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-27 12:26:20 +02:00
ZanSara
6665e8ec7f
Add preview extra to e2e tests (#5898) 2023-09-27 10:36:00 +02:00
Stefano Fiorucci
a4787e7b52
pin setuptools_scm only for windows (#5894) 2023-09-26 18:39:50 +02:00
Stefano Fiorucci
61877056ef
pin setuptools_scm in the metrics extra (#5891) 2023-09-26 17:12:59 +02:00
bogdankostic
80192589b1
feat: Add AzureOCRDocumentConverter (2.0) (#5855)
* Add AzureOCRDocumentConverter

* Add tests

* Add release note

* Formatting

* update docstrings

* Apply suggestions from code review

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* PR feedback

* PR feedback

* PR feedback

* Add secrets as environment variables

* Adapt test

* Add azure dependency to CI

* Add azure dependency to CI

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-26 15:57:55 +02:00
Stefano Fiorucci
c8398eeb6d
test: e2e test for Extractive QA Pipeline (#5879)
* e2e test for e. qa pipeline
2023-09-26 15:44:34 +02:00
Silvano Cerza
cf7f0ebc22
Add Pipelines async run (#5864)
* Add Pipeline.arun()

* Sleeper node

* Fix async running

* Add e2e tests

To run a Pipeline that doesn't have any async node in async mode:

    pytest e2e/pipelines/test_standard_pipelines.py::test_query_and_indexing_pipeline

To run a Pipeline that has a single async node in concurrent mode:

    pytest e2e/pipelines/test_standard_pipelines.py::test_async_concurrent_complex_pipeline

To run a Pipeline that has a single async node in sequential mode:

    pytest e2e/pipelines/test_standard_pipelines.py::test_async_sequential_complex_pipeline

* Remove unused _adispatch_run method

* Make Pipeline.run work with async nodes

* Revert "Make Pipeline.run work with async nodes"

This reverts commit 22d7a94e4d41aca1b59dad18c0b366fbb6e8f431.

* Rename Pipeline.arun to Pipeline._arun

* Enhance docstring

* Add Sleeper docstring

* Add release notes

* ignore typing across the node

* make pylint happy

* skip pylint on needed unused import

* fix

* if a node has an arun method, use it

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-09-26 15:37:27 +02:00
github-actions[bot]
8d26057566
Update unstable version (#5887)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
v1.22.0-rc0
2023-09-26 15:23:14 +02:00
ZanSara
6cb7d16e22
feat: preview extra (#5869)
* copy the deps list over from haystack-ai

* fix lazyimport usage

* keep jinja and openai

* fix ci

* reno

* separate out preview unit tests

* fix import error message for tika

* tika

* add preview to all

* wrap torch

* remove comment

* unwrap openai and jinja
v1.21.0-rc0
2023-09-26 12:48:15 +02:00
Stefano Fiorucci
e9d34fc0e3
test: e2e tests for RAG Pipelines (#5876)
* relax extractive reader integration tests

* force reader to CPU

* ensure integration tests reproducibility

* e2e rag tests

* move set_all_seeds to testing package

* refine rag tests

* Update e2e/preview/pipelines/test_rag_pipelines.py

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-09-26 11:49:50 +02:00
Stefano Fiorucci
6aa471ac5e
chore: make preview integration tests reproducible (#5871)
* relax extractive reader integration tests

* force reader to CPU

* ensure integration tests reproducibility

* move set_all_seeds to testing package
2023-09-25 18:39:10 +02:00