9 Commits

Author SHA1 Message Date
Vladimir Blagojevic
c3b96392fd
feat: Use all HTMLToDocument extractors until content is extracted (#7452)
* Use all HTMLToDocument extractors until content is extracted

* Add release note

* Minor doc update

* Improvements, unit test fixes

* Add try_others init param, update tests

* Update haystack/components/converters/html.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* PR feedback - Stefano

* Improve reno release note, add  reference

* little fixes

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-04-05 16:02:34 +02:00
Madeesh Kannan
5d66d040cc
feat: Add serde methods to HTMLToDocument (#6758) 2024-01-18 10:02:01 +01:00
ZanSara
ff55985e2d
feat: support single metadata dictionary in HTMLToDocument (#6613)
* support single metadata in HTMLToDocument

* reno

* docstring
2023-12-21 16:45:31 +01:00
sahusiddharth
3d17e6ff76
changed metadata to meta (#6605) 2023-12-21 12:39:58 +01:00
Stefano Fiorucci
94cfe5d9ae
feat!: HTMLToDocument - allow choosing the boilerpy3 extractor (#6582)
* allow extractor customizability

* release note

* typo
2023-12-19 10:52:12 +01:00
Stefano Fiorucci
2f034d3c97
refactor!: Converters - standardize inputs (#6540)
* standardize converters inputs: first draft

* fix precommit

* fix precommit 2

* fix precommit 3

* add default for optional param

* rm leftover

* install boilerpy in linting workflow

* add boilerpy3 to the core dependencies

* add reno

* remove boilerpy3 installation from test workflow

* fix pylint: import order and unused import

* fix import order

* add release note

* better Tika docstring

* rm boilerpy from linting

* leftover

* md link brackets

* feat: Converters - allow passing `meta` in the `run` method (#6554)

* first impl for html

* progressing on other components

* fix test

* add tests - run with meta

* release note

* reintroduce patches wrongly deleted

* add patch in test

* fix tika test

* Update haystack/components/converters/azure.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update releasenotes/notes/converters-standardize-inputs-ed2ba9c97b762974.yaml

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* simplify test

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-12-15 16:41:35 +01:00
Massimiliano Pippi
7c05f37a53
remove unit marker (#6450) 2023-11-29 19:24:25 +01:00
Silvano Cerza
e6637f5ec2 Fix all tests 2023-11-24 14:48:43 +01:00
Massimiliano Pippi
8adb8bbab8
Remove preview folder in test/
---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:52:55 +01:00