3803 Commits

Author SHA1 Message Date
Stefano Fiorucci
d90b0de124
Update README.md (#6850) 2024-01-30 10:03:01 +01:00
Silvano Cerza
f5e61338ba
chore: Remove all mentions of Canals (#6844)
* Remove unnecessary Connection class

* Remove all mentions of canals

* Add release notes
2024-01-29 17:26:11 +01:00
Silvano Cerza
9211f535b6
Remove unnecessary Connection class (#6842) 2024-01-29 17:25:52 +01:00
Silvano Cerza
b1ec32dae0
Simplify Pipeline.__eq__ logic (#6840) 2024-01-29 14:54:46 +01:00
Massimiliano Pippi
acf4cd502f
refact: Rename helper function (#6831)
* change function name

* add api docs

* release notes
2024-01-26 16:00:02 +01:00
Madeesh Kannan
fdf844f762
fix: Fix missing format string prefixes in pipeline.py (#6834) 2024-01-26 15:31:56 +01:00
Ashwin Mathur
7217f9d9f0
feat: Add F1 metric (#6822)
* Add F1 metric

* Add release notes
2024-01-26 11:04:43 +01:00
Stefano Fiorucci
b176750532
improve reno config (#6827) 2024-01-26 09:47:52 +01:00
Sebastian Husch Lee
3bea3b1714
feat: Add query and document prefix options for the TransformerSimilarityRanker (#6826)
* Add query and doc prefix

* Fix some tests

* add release notes
2024-01-25 15:29:19 +01:00
Rob Pasternak
7358b910d7
feat: Weights and score normalization for DocumentJoiner with reciprocal rank fusion (#6735)
* Add weighting and score normalization for DocumentJoiner w/ reciprocal rank fusion (fix trailing whitespace)

* Add release notes

* Add unit test

* Update release note

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-01-24 15:45:53 +01:00
Vladimir Blagojevic
6e86f4e26a
Update embedding integration tests (#6823) 2024-01-24 15:22:47 +01:00
Vladimir Blagojevic
c47b82c54f
Remove pipeline_utils package and dependent code (#6806) 2024-01-23 18:40:43 +01:00
Massimiliano Pippi
4efe40664c
use haystack-pydoc-tools package instead of local code (#6818) 2024-01-23 18:28:52 +01:00
Tuana Çelik
1825140654
Readme updates (#6817)
* add info on dD

* fix

* Update README.md

* make tip box

* move location
2024-01-23 15:29:36 +01:00
Daria Fokina
6d8f369e9d
chore: mention cookbook repo in README (#6814)
* readme update

* formatting fix

* format2
2024-01-23 14:06:34 +01:00
Daria Fokina
5d300a7356
add missing components to docs (#6813) 2024-01-23 14:03:15 +01:00
Massimiliano Pippi
df2a23dfa5
chore: cleanup unused code (#6804)
* remove validation module

* remove unused code

* adjust imports

* sort imports
2024-01-23 13:20:53 +01:00
Massimiliano Pippi
f44f123b3f
chore: mention integrations in the README (#6805)
* mention integrations in the README

* Apply suggestions from code review

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update README.md

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Tuana Çelik <tuana.celik@deepset.ai>
2024-01-22 15:38:00 +01:00
Madeesh Kannan
5c8feeac6a
proposal: Integration of 3rd party evaluation frameworks (#6784)
* proposal: Integration of 3rd party evaluation frameworks

* Add note about previous eval proposal
2024-01-22 12:35:27 +01:00
Ashwin Mathur
a238c6dd51
feat: Add Exact Match metric (#6696)
* Add exact match metric

* Add release notes

* Cleanup comments in test_eval_exact_match.py

* Create separate preprocessing function; Add output_key parameter

* Update release note

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-01-22 09:57:04 +01:00
Daria Fokina
8a08ab52e1
add telemetry overview (#6785) 2024-01-19 16:41:28 +01:00
Augustin Chan
cad30b039a
add .haystack_debug to .gitignore (#6782) 2024-01-19 10:44:35 +01:00
Vladimir Blagojevic
f47439c2a2
Use forward references for type hints, avoid NameError (#6780) 2024-01-18 18:46:59 +01:00
Silvano Cerza
d4f6531c52
feat: Refactor Pipeline.run() (#6729)
* First rough implementation of refactored run

* Further improve run logic

* Properly handle variadic input in run

* Further work

* Enhance names and add more documentation

* Fix issue with output distribution

* This works

* Enhance run comments

* Mark Multiplexer as greedy

* Remove MergeLoop in favour of Multiplexer in tests

* Remove FirstIntSelector in favour of Multiplexer

* Handle corner when waiting for input is stuck

* Remove unused import

* Handle mutable input data in run and misbehaving components

* Handle run input validation

* Test validation

* Fix pylint

* Fix mypy

* Call warm_up in run to fix tests
2024-01-18 17:53:47 +01:00
Vladimir Blagojevic
40a8b2b4a9
Move import to lazy import section (#6778) 2024-01-18 17:34:55 +01:00
Vladimir Blagojevic
0b177b3bc6
feat: Improve OpenAPIServiceConnector service response serialization (#6772)
* Better service response json -> str serialization

* Add unit test
2024-01-18 16:49:48 +01:00
Vladimir Blagojevic
fea1428e84
feat: Add HuggingFaceLocalChatGenerator (#6751) 2024-01-18 15:53:12 +01:00
dependabot[bot]
8d65a8630b
chore(deps): bump tj-actions/changed-files from 41 to 42 (#6774)
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 41 to 42.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](https://github.com/tj-actions/changed-files/compare/v41...v42)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-18 15:24:26 +01:00
dependabot[bot]
ac353c4652
chore(deps): bump actions/cache from 3 to 4 (#6775)
Bumps [actions/cache](https://github.com/actions/cache) from 3 to 4.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](https://github.com/actions/cache/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-18 15:21:31 +01:00
Silvano Cerza
8079501925
Speed up Document dataclass import (#6767) 2024-01-18 15:18:02 +01:00
Silvano Cerza
1c76aa07bb
Fix __version__ handling (#6765) 2024-01-18 11:11:08 +01:00
Madeesh Kannan
5d66d040cc
feat: Add serde methods to HTMLToDocument (#6758) 2024-01-18 10:02:01 +01:00
Sebastian Husch Lee
c0b67432e4
feat: Add page breaks to default PDF to Document converter (#6755)
* Speedup tests for PyPDFToDocument

* Added unit test and removed skipping of empty pages

* add release note

* Add back some integration marks
2024-01-18 08:54:59 +01:00
Madeesh Kannan
eaec5bfe4a
refactor: Move HF-specific model serde code to a new submodule. (#6754)
* refactor: Move HF-specific model serde code to a new submodule.

* Remove unused import
2024-01-17 18:00:16 +01:00
Julian Risch
d1bdb8c63d
chore: bump Haystack version to beta5 (#6757) v2.0.0-beta.5 2024-01-17 17:28:36 +01:00
sahusiddharth
a7ac4edd07
feat: added split by page to DocumentSplitter (#6753)
* feat-added-split-by-page-to-DocumentSplitter

* added test case and the suggested changes

* Update document_splitter.py

* Update haystack/components/preprocessors/document_splitter.py

* Update test_document_splitter.py

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-01-17 15:36:29 +01:00
Madeesh Kannan
6a1514550e
test: Update E2E tests to use Pipeline.dump/load (#6756) 2024-01-17 15:09:27 +01:00
Vladimir Blagojevic
88191e74bf
chore: Fix lazy import in HuggingFaceLocalGenerator (#6752)
* Fix lazy import in HuggingFaceLocalGenerator

* Fix pylint

* Import fix after merge
2024-01-17 14:32:03 +01:00
Madeesh Kannan
7376838922
feat!: Framework-agnostic device management (#6748)
* feat: Framework-agnostic device management

* Add release note

* Linting

* Fix test

* Add `first_device` property, expand release notes, validate `ComponentDevice` state
2024-01-17 10:41:34 +01:00
ZanSara
b8b8b5d5c6
feat!: rename model_name_or_path to model in NamedEntityExtractor (#6744)
* rename model_name_or_path to simply model

* fix tests

* reno
2024-01-16 15:32:48 +01:00
ZanSara
909c1eb023
fix a few docstrings (#6743) 2024-01-16 13:56:16 +01:00
Madeesh Kannan
d6cafeaff3
test: Rename RAG E2E test file (#6750)
Prior to this change, this broke `pytest` workflows in VSCode due to identical test names in this file and the integration/unit test file.
2024-01-16 13:40:22 +01:00
Sebastian Husch Lee
20f04f6054
feat: MetaFieldRanker update (#6742)
* Add weight and ranking_mode as params to run for easier experimentation

* renaming of metadata to meta

* User logger.warning instead of warnings

* Add another unit test

* Add support for sort_order and fix formatting of error messages

* Make MetaFieldRanker more robust. Doesn't crash pipeline if some Documents are missing keys.

* Don't print same warning message twice

* Add another test

* Making MetaFieldRanker more robust

* Move up if return statement to earlier in the function

* Setting up infer_type

* Remove infer_type for now

* Release notes

* Add init file

* Update releasenotes/notes/metafieldranker_sort-order_refactor-2000d89dc40dc15a.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-01-16 08:52:58 +01:00
Vladimir Blagojevic
8cafff0645
refactor: Extract HF stop words handling in hf_utils.py (#6745)
* Move StopWordsCriteria to hf_utils.py

* Raise ValueError for invalid StopWordsCriteria tokenizer

* StopWordsCriteria, make sure padding token exists

* Use proper torch types

* Update unit tests
2024-01-15 17:42:29 +01:00
ZanSara
96c0b59aaa
feat!: Rename model_name_or_path to model in ExtractiveReader (#6736)
* rename model parameter and internam model attribute in ExtractiveReader

* fix tests for ExtractiveReader

* fix e2e

* reno

* another fix

* review feedback

* Update releasenotes/notes/rename-model-param-reader-b8cbb0d638e3b8c2.yaml
2024-01-15 14:48:33 +01:00
ZanSara
b236ea49e3
fix: hybrid pipeline e2e test (#6740)
* fix hybrid pipeline e2e test

* warmup

* write to the right docstore
2024-01-15 14:20:02 +01:00
Stefano Fiorucci
8eba053dbc
fix pipeline test (#6741) 2024-01-15 13:59:11 +01:00
ZanSara
24afc2a7fc
feat: Highlight optional connections in Pipeline.draw() (#6724)
* highlight optional connections in Pipeline.draw()

* reno
2024-01-15 12:18:51 +01:00
Madeesh Kannan
a5189dd035
fix!: InMemoryBM25Retriever no longer returns documents that have a score of 0.0 (#6717)
* fix!: `InMemoryBM25Retriever` no longer returns documents that have a score of 0.0

Also update tests to accommodate the new behavior.

* Remove superfluous code
2024-01-12 17:50:55 +01:00
Madeesh Kannan
4647f2a506
fix: ComponentMeta.__call__ handles keyword- and positional-only parameters correctly (#6701)
* fix: `ComponentMeta.__call__` handles keyword- and positional-only parameters correctly

* Update release note
2024-01-12 17:16:03 +01:00