137 Commits

Author SHA1 Message Date
Silvano Cerza
5546c8144e
ci: Speed up tests.yml by caching dependencies (#6417)
* Speed up tests.yml by caching dependencies

* Trigger for testing

* Use restore only action to speedup restoring

* Use bash shell to get pip cache dir

* Set shell for caching step

* Cache correct path

* Remove trigger
2023-12-20 16:21:48 +01:00
Stefano Fiorucci
cf47abdff5
chore: simplify the management of test dependencies (#6559)
* remove audio dep group

* extract dependencies

* beautify

* rm one step
2023-12-15 16:40:41 +01:00
Massimiliano Pippi
bc45170f4e
chore: add boilerpy3 to the core dependencies (#6544)
* add boilerpy3 to the core dependencies

* remove boilerpy3 installation from test workflow

* fix pylint: import order and unused import

* fix import order

* add release note

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-12-14 11:53:38 +01:00
dependabot[bot]
51b49b838c
chore(deps): bump actions/setup-python from 4 to 5 (#6498)
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4 to 5.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-06 22:33:05 +01:00
Massimiliano Pippi
bf542ebfb0
downgrade unnecessary runner instance (#6477) 2023-12-04 11:39:36 +01:00
Massimiliano Pippi
a86807b834
move Cohere generator into dedicated integration (#6475) 2023-12-04 11:16:12 +01:00
Massimiliano Pippi
011e32ebdf
chore: merge canals into Haystack codebase (#6422)
* Ignore some mypy errors

* Fix I/O comparator

* Avoid calling asdict multiple times when comparing dataclasses

* Enhance component tests

* Fix I/O dataclasses comparison

* Use Any instead of type when expecting I/O dataclasses

* Fix mypy

* Change InputSocket taken_by field to sender

* Remove variadics implementation

* Adapt tests

* Enhance docs and simplify run

* Remove useless check on drawing

* Add __canals_optional_inputs__ field in components

* Rework a bit Pipeline._ready_to_run()

* Simplify some logic

* Add __canals_mandatory_inputs__ field in components

* Handle pipeline loops

* Fix tests

* Document component state run logic

* Add double loop pipeline test

* Make component decorator a class

* PR feedback

* Add error logging when registering Component with identical names

* Add 'remove' action that removes current component from Pipeline run input queue

* Simplify run checks and logging

* Better logging

* Apply suggestions from code review

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* Trim whitespace

* Add support for Union in Component's I/O

* Remove dependencies section in marshaled pipelines

* Create Component Protocol

* simpler optional deps

* Simplify component init wrapping and fix issue with save_init_params

* Update canals/pipeline/save_load.py

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* Simplify functions to find I/O sockets

* Fix import

* change import

* testing ci

* testing ci

* Simplify _save_init_params

* testing ci

* testing ci

* use direct pytest call

* trying to force old version for macos

* list macos versions

* list macos versions

* disable on macos

* remove extra

* refactor imports

* re-enable some logs

* some more tests

* small correction

* Remove unused leftover methods

* docs

* update docstring

* mention optionals

* example for dataclass initialization

* missed part

* fix api docs

* improve error reporting and testing

* add tests for Any

* parametrized tests

* fix test for py<3.10

* test type printing

* remove typing. prefix from Any (compat with Py3.11)

* test helpers

* test names

* add type_is_compatible()

* tests pass

* more tests

* add small comment

* handle Unions as anything else

* use sender/receiver for socket pairs

* more sender/receiver renames

* even more renames

* split if statement

* Update __about__.py

* fix logic operator and add tests

* Update __about__.py

* Simplify imports

* Move draw in pipeline module and clearly define public interface

* Format pyproject.toml

* Include only required files in built wheel

* Move sample components out of tests

* stub component class decorator

* update static sample components to new API

* stub

* dynamic output examples

* sum

* add components fixed

* re-add inputsocket and outputsocket creation

* fix component tests

* fixing tests

* Add methods to set I/O dinamically

* fix drawing

* fix some integration tests

* tests green

* pylint

* remove stray files

* Remove default in InputSocket and add is_optional field

* Fix drawing

* Rework sockets string representation

* Add back Component Protocol

* Simplify method to get string representation of types

* Remove sockets __str__

* Remove Component's I/O type checks at run time

* Remove IO check in init wrapper

* Update canals/utils.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Split __canals_io__ field in __canals_input__ and __canals_output__

* Order input and output fields

* Add test to verify __canals_component__ is set

* Remove empty line

* Add component class factory

* Fix API docs workflow failure

* fix api docs

* Update __about__.py

* Add component from_dict and to_dict methods

* Add Pipeline to_dict and from_dict

* Fix components tests

* Add some more tests

* Change error messages

* Simplify test_to_dict

* Add max_loops_allowed in test_to_dict

* Test non default max_loops_allowed in test_to_dict

* Rework marshal_pipelines

* Rework unmarshal_pipelines

* Rename some stuff

* allow falsy outputs

* apply falsy fix to validation

* add test for falsy inputs

* Split _cleanup_marshalled_data into two functions

* Use from_dict to deserialise component

* Remove commented out code and update variable name

* Add test to verify difference when unmarshaling Pipeline with duplicate names

* Update marshal_pipelines docstring

* update workflow

* exclude tests from mypy in pre-commit hooks

* add additional falsy tests

* remove unnecessary import

* split test into two

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* remove init_parameters decorator and fix assumptions

* fix accumulate

* stray if

* Bump version to 0.5.0

* Implement generic default_to_dict and default_from_dict

* Update default_to_dict docstring

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Remove all mentions of Component.defaults

* Add Remainder to_dict and from_dict (#91)

* Add Repeat to_dict and from_dict (#92)

* Add Sum to_dict and from_dict (#93)

* Add Greet to_dict and from_dict (#89)

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Rework Accumulate to_dict and from_dict (#86)

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Add to_dict and from_dict for Parity, Subtract, Double, Concatenate (#87)

* Add Concatenate to_dict and from_dict

* Add Double to_dict and from_dict

* Add Subtract to_dict and from_dict

* Add Parity to_dict and from_dict

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Change _to_mermaid_text to use component serialization data (#94)

* Add MergeLoop to_dict and from_dict (#90)

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Add Threshold to_dict and from_dict (#97)

* Add AddFixedValue to_dict and from_dict (#88)

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Remove BaseTestComponent (#99)

* Change @component decorator so it doesn't add default to_dict and from_dict (#98)

* Rename some classes in tests to suppress Pytest warnings (#101)

* Check Component I/O socket names are valid (#100)

* Remove handling of shared component instances on Pipeline serialization (#102)

* Fix docs

* Bump version to 0.6.0

* Revert "Check Component I/O socket names are valid (#100)" (#103)

This reverts commit 4529874b562d12331ee2f4fde926ef5b5e3d24d7.

* Bump canals to 0.7.0

* Downgrade log from ERROR to DEBUG (#104)

* Make to/from_dict optional (#107)

* remove from/to dict from Protocol

* use a default marshaller

* example component with no serializers

* fix linting

* make it smarter

* fix linting

* thank you mypy protector of the dumb programmers

* feat: check returned dictionary (#106)

* better error message if components don't return dictionaries

* add test

* use factory

* needless import

* Update __about__.py

* fix default serialization and adjust sample components accordingly (#109)

* fix default serialization and adjust sample components accordingly

* typo

* fix pylint errors

* fix: `draw` function vs init parameters (#115)

* fix draw

* stray print

* Update version (#118)

* remove extras

* Revert "remove extras"

This reverts commit a096ff8f07bdcb6e54ec8457bcfad5db44d8bf03.

* fix package name, change _parse_connection_name function name, add tests (#126)

* move sockets into components package (#127)

* chore: remove extras (#125)

* remove extras

* workflow

* typo

* fix: Sockets named "text/plain" or containing a "/" fail during pipeline.to_dict (#131)

* don't split sockets by /

* revert hashing edge keys

* docs: remove missing module from docs (#132)

* remove stray print (#123)

* addo sockets docs (#133)

* tidy up utils about types (#129)

* Update canals.md (#134)

* rename module in API docs

* make `__canals_output__` and `__canals_input__` management consistent  (#128)

* make __canals_output__ and __canals_input__ management consistent and assign them to the component instance

* make pylint happy

* return the original type instead of the metaclass

* use type checking instead of instance field

* declare the actual returned type

* fix after conflict resolution

* remove check

* Do not use a dict as intermediate format and use `Socket`s directly (#135)

* do not use a dict as intermediate format and use sockets directly to simplify code and remove side effects

* fix leftover from cherry-pick

* move is_optional evaluation for InputSocket to post_init (#136)

* re-introduce variadics to support Joiner node (#122)

* move sockets into components package

make __canals_output__ and __canals_input__ management consistent and assign them to the component instance

do not use a dict as intermediate format and use sockets directly to simplify code and remove side effects

move is_optional evaluation for InputSocket to post_init

re-introduce variadics to support Joiner node

restore connection-time check

use custom type annotation, fix tests

* fix leftovers from rebase

* rename fan-in to joiner

* clean up and fix typing

* let inputs arrive later

* address review comments

* address review comments

* fix docstrings

* try

* try

* fix run input

* linting

* remove comments

* fix pylint

* bumb version to 0.9.0 (#140)

* properly annotate classmethods (#139)

* feat: add `Pipeline.inputs()` (#120)

* add Pipeline.describe_input()

* add tests

* split dict and str outputs and add to error messages

* tests

* accepts/expects

* move methods

* fix tests

* fix module name

* tests

* review feedback

* Add missing typing_extensions dependency (#152)

* feat: use full connection data to route I/O (#148)

* fix sample components

* make sum variadic

* separate queue and buffer

* all works but loops & variadics together

* fix some tests

* fix some tests

* all tests green

* clean up code a bit

* refactor code

* fix tests

* fix self loops

* fix reused sockets bug

* add distinct loops

* add distinct loops test

* break out some code from run()

* docstring

* improve variadics drawing

* black

* document the deepcopy

* re-arrange connection dataclass and add tests

* consumer -> receiver

* fix typing

* move Connection-related code under component package

* clean up connect()

* cosmetics and typing

* fix linter, make Connection a dataclass again

* fix typing

* add test case for #105

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* feat: Add Component inputs/outputs functions (#158)

* Add component inputs/outputs methods

* Different impl approach

* Black fixes

* Rename functions to match naming in pipeline inputs/ouputs

* Fix find_component_inputs, update unit tests (#162)

* Fix API docs (#164)

* make Variadic wrap an iterable (#163)

* Add pipeline outputs method (#150)

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* Update __about__.py (#165)

Update version to 0.10.0

* add CODEOWNERS

* feat: read defaults from `run()` signature (#166)

* Read defaults from run signature

* simplify setting of sockets

* fix test

* Update sample_components/fstring.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update canals/component/component.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* dostring

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Use full import path as 'type' in serialization.  (#167)

* Use full import path as 'type' in serialization. Try to import the path when deserializing

* fix test data

* add from_dict test

* remove leftover

* Update canals/pipeline/pipeline.py

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* add error message to PipelineError

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* bump version

* fix: copy input values before passing them down pipeline.run (#168)

* copy input values before passing them down pipeline.run

* Update test_mutable_inputs.py

* fix mypy and pyright (#169)

* bump version

* remove data we won't keep

* reformat

* try

* skip tests on transient code

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
Co-authored-by: Michel Bartels <login@michelbartels.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Julian Risch <julianrisch@gmx.de>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-11-27 15:16:35 +01:00
Silvano Cerza
db759b0717
Add black step when testing examples (#6425) 2023-11-27 15:01:33 +01:00
Silvano Cerza
892625a6c7
ci: Add back workflows that runs in place of linting.yml and tests.yml (#6421)
* Add back workflow that runs in place of linting.yml

* Add back workflow that runs in place of tests.yml
2023-11-27 13:18:47 +01:00
Silvano Cerza
8bfaf0a56a
ci: Add catch-all job in tests.yml (#6419)
* Add catch-all job in tests.yml

* Trigger for testing

* Remove trigger for testing
2023-11-27 12:57:33 +01:00
Silvano Cerza
9338de1790 Add missing tests workflow dependency 2023-11-24 16:00:59 +01:00
Massimiliano Pippi
4a1fe163b6
fix names in workflows 2023-11-24 14:59:31 +01:00
Massimiliano Pippi
09e7831f60
clean up 1.x code
---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:47:47 +01:00
Massimiliano Pippi
209e349be3
do not run preview tests twice (#6204) 2023-10-31 13:13:32 +01:00
Grant Williams
1cf70d3dce
build: Upgrade transformers to the latest version 4.34.1 (#5994)
* Upgrade transformers to the latest version 4.34.0 so that Haystack can support the new Mistral, Nougat, and other models.

* update release notes

* updated missing lazy import

* Update .github workflows imports

* bump more versions in .github workflows

* rever import sorting

* Update  to catch runtime errors to match haystack_hub changes

* add language parameter value to whisper test

* bump transformers version in linting preview workflow

* bump transformers version in linting preview workflow

* bump version to v4.34.1

* resolve mypy issue with reused variables

* install openai-whisper without dependencies

* remove audio extra, update whisper install instructions

* remove audio extra, update whisper install instructions

* keep audio extra but add version

* keep audio extra with no constraints

* remove audio extra

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-10-24 19:13:12 +02:00
Stefano Fiorucci
c4187eeebe
CI: make only test_preview run when preview e2e tests are changed (#6078)
* make only test_preview workflow run when e2e tests are modified

* revert wrong changes to test_preview

* revert wrong order
2023-10-17 10:06:39 +02:00
ZanSara
adf7e49af3
chore: review all extra (#6029) 2023-10-12 21:50:53 +02:00
ZanSara
81b2e83d04
feat: separate out preview tests (#5639)
* add preview workflows

* feedback

* feedback

* use preview extra

* remove coverage and add separate e2e

* rename workflow file for consistency

* trigger ci

* undo trigger

* torch import in testing

* add deps to unit tests

* feedback

* run container instead of service

* comment

* add if statement

* fix tika version

* separate out win integration tests

* separate out all CIs

* try installing docker on macos

* exclude tika

* remove tika docker
2023-09-29 13:16:08 +02:00
Massimiliano Pippi
dfa48eece9
clean up the Slack integrations (#5908) 2023-09-28 15:49:19 +02:00
bogdankostic
80192589b1
feat: Add AzureOCRDocumentConverter (2.0) (#5855)
* Add AzureOCRDocumentConverter

* Add tests

* Add release note

* Formatting

* update docstrings

* Apply suggestions from code review

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* PR feedback

* PR feedback

* PR feedback

* Add secrets as environment variables

* Adapt test

* Add azure dependency to CI

* Add azure dependency to CI

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-26 15:57:55 +02:00
ZanSara
6cb7d16e22
feat: preview extra (#5869)
* copy the deps list over from haystack-ai

* fix lazyimport usage

* keep jinja and openai

* fix ci

* reno

* separate out preview unit tests

* fix import error message for tika

* tika

* add preview to all

* wrap torch

* remove comment

* unwrap openai and jinja
2023-09-26 12:48:15 +02:00
bogdankostic
9a4373bf8e
feat: Add TikaDocumentConverter (2.0) (#5847)
* Add TikaFileToDocument component

* Add tests

* Add tika service to CI

* Add release note

* Change name

* PR feedback

* Fix naming in tests

* Fix tika version in CI

* Update tests

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-09-25 11:47:21 +02:00
ZanSara
28f5c4c780
fix: Whisper integration tests (#5851)
* fix tests

* add ffmpeg

* apt update for ffmpeg

* not run on windows
2023-09-21 00:14:07 +02:00
ZanSara
ea2a5595ca
add missing dependency (#5849) 2023-09-20 12:57:53 +02:00
bogdankostic
57d33ee6da
ci: Run preview integration tests in CI (#5843)
* Run preview integration tests in CI

* Only install inference extra
2023-09-20 11:54:41 +02:00
Christian Clauss
75dc60b0bb
ci: Upgrade GitHub Actions (#5787) 2023-09-13 09:58:47 +02:00
Silvano Cerza
b53fad4c4f
Add missing integration tests to catch-all required step in tests.yml (#5598) 2023-08-18 17:58:26 +02:00
Silvano Cerza
bc152d953c
Skip running tests in CI when editing docs Python files (#5482) 2023-08-01 12:31:24 +02:00
bogdankostic
b7f683bfa4
ci: Add unit test for Elasticsearch8 (#5300)
* Add job for ES8 integration tests

* Add unit test for Elasticsearch 8

* Add tests.yml

* Adapt tests.yml

* Remove added white space

* Adapt tests.yml

* Adapt tests.yml

* Add dependencies to unit test name

* Adapt unit test matrix

* Adapt unit test matrix

* Adapt unit test matrix

* Adapt unit test matrix

* Update tests.yml

* Create separate tests where necessary

* Fix skip

* Adapt tests
2023-07-10 16:03:50 +02:00
bogdankostic
048fc7f640
ci: Add job for ES8 integration tests (#5297)
* Add job for ES8 integration tests

* Remove whitespace

* Fix filename

* Add tests.yml

* Revert "Add tests.yml"

This reverts commit ec12654d4e146b5ef6cba04ad82f5973935d8520.
2023-07-10 10:43:05 +02:00
Julian Risch
30fdf2b5df
feat!: Add extra for inference dependencies such as torch (#5147)
* feat!: add extra for inference dependencies such as torch

* add inference extra to 'all' and 'all-gpu' extra

* install inference extra in selected integration tests

* import LazyImport

* review feedback

* add import error messages and update readme

* remove extra dot
2023-06-20 09:54:10 +02:00
ZanSara
8487cddc69
add cli to the jobs list (#5060) 2023-06-01 13:22:17 +02:00
Massimiliano Pippi
929b8d1fb0
ci: run Elasticsearch 8.6 in compatibility mode (#3853)
* bump ES version in CI

disable ssl

wait for service to start

set env vars

do not use choco to install ES

re-enable jobs deps

skip test on windows CI because of OOM

allocate more memory for ES

uniform ES installation and use default heap size

skip tests causing OOM

increase job timeout

restore memory limit for ES8

* Use latest elasticsearch version
2023-05-24 18:53:54 +02:00
Silvano Cerza
f235d30af8
Add workflow name to Datadog event (#4968) 2023-05-19 17:42:33 +02:00
Silvano Cerza
ce4cf3bc55
Add workflow id to Datadog event tags (#4965) 2023-05-19 16:52:39 +02:00
Silvano Cerza
d5cc6ff9a9
ci: Remove legacy tests (#4961)
* Remove legacy tests

* Remove unecessary env vars
2023-05-19 15:49:07 +02:00
Silvano Cerza
69bae2a3d6
Set calculator shell explicitly to handle Windows runs (#4960) 2023-05-19 15:15:18 +02:00
Silvano Cerza
2d76237508
Fix step failing to calculate Datadog event type (#4958) 2023-05-19 15:03:09 +02:00
Silvano Cerza
21ca24f70b
Send tests outcomes to Datadog instead of sending message to Slack (#4957) 2023-05-19 14:45:36 +02:00
Massimiliano Pippi
d322beed6c
build: do not install 'dev' extras with 'all' (#4888)
* do not install 'dev' with 'all'

* some fixes around
2023-05-11 19:24:47 +02:00
Silvano Cerza
6c84a05d98
Upload coverage only if all unit tests pass (#4874) 2023-05-11 14:29:44 +02:00
Silvano Cerza
06193e08b1
Add missing unit tests topics to coverage upload step (#4873) 2023-05-10 12:51:52 +02:00
Sebastian
707f1c3546
Add modeling to unit tests so it we can get coverage for that (#4809)
* Add modeling to unit tests so it we can get coverage for that

* fix unit tests

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-05-08 19:05:21 +02:00
Silvano Cerza
9b67611169
Add others folder to unit test job (#4800) 2023-05-03 10:47:21 +02:00
Silvano Cerza
645a5fe5ba
ci: Add coverage tracking with Coveralls (#4772)
* Format tests.yml properly

* Add pytest-cov dependency

* Add coverage in unit tests

* Ignore cov.info

* Change report format

* Unignore cov.info
2023-04-28 11:59:09 +02:00
ZanSara
1b57b96210
refactor!: extract elasticsearch (#4668)
* extract elasticsearch

* update pyproject.toml

* make more import optional

* move MockBaseRetriever in conftest

* install es in the es integration tests
2023-04-26 10:14:20 +02:00
bogdankostic
91b775bf43
Execute pipelines and utils unit tests in CI (#4749) 2023-04-26 10:00:52 +02:00
Massimiliano Pippi
0c081f19e2
fix: remove warnings from the more recent Elasticsearch client (#4602)
* clean up the ES instance in a more robust way

* do not sleep, refresh the index instead

* remove client warnings

* fix unit tests

* fix opensearch compatibility

* fix unit tests

* update ES version

* bump elasticsearch-py

* adjust docs

* use recreate_index param

* use same fixture strategy for Opensearch

* Update lg

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-04-18 15:40:17 +02:00
ZanSara
d8ac30fa47
refactor!: extract preprocessing and file conversion deps (#4605)
* isolate file-conversion deps

* pylint

* add to all extra

* chain was missing

* move langdetect into preprocessing and fix tika

* add file-conversion extra
2023-04-14 11:34:16 +02:00
ZanSara
174d80ab41
skip tests (#4654) 2023-04-13 17:56:51 +02:00