Vladimir Blagojevic
|
c3b96392fd
|
feat: Use all HTMLToDocument extractors until content is extracted (#7452)
* Use all HTMLToDocument extractors until content is extracted
* Add release note
* Minor doc update
* Improvements, unit test fixes
* Add try_others init param, update tests
* Update haystack/components/converters/html.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* PR feedback - Stefano
* Improve reno release note, add reference
* little fixes
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
|
2024-04-05 16:02:34 +02:00 |
|
Madeesh Kannan
|
5d66d040cc
|
feat: Add serde methods to HTMLToDocument (#6758)
|
2024-01-18 10:02:01 +01:00 |
|
ZanSara
|
ff55985e2d
|
feat: support single metadata dictionary in HTMLToDocument (#6613)
* support single metadata in HTMLToDocument
* reno
* docstring
|
2023-12-21 16:45:31 +01:00 |
|
sahusiddharth
|
3d17e6ff76
|
changed metadata to meta (#6605)
|
2023-12-21 12:39:58 +01:00 |
|
Stefano Fiorucci
|
94cfe5d9ae
|
feat!: HTMLToDocument - allow choosing the boilerpy3 extractor (#6582)
* allow extractor customizability
* release note
* typo
|
2023-12-19 10:52:12 +01:00 |
|
Stefano Fiorucci
|
2f034d3c97
|
refactor!: Converters - standardize inputs (#6540)
* standardize converters inputs: first draft
* fix precommit
* fix precommit 2
* fix precommit 3
* add default for optional param
* rm leftover
* install boilerpy in linting workflow
* add boilerpy3 to the core dependencies
* add reno
* remove boilerpy3 installation from test workflow
* fix pylint: import order and unused import
* fix import order
* add release note
* better Tika docstring
* rm boilerpy from linting
* leftover
* md link brackets
* feat: Converters - allow passing `meta` in the `run` method (#6554)
* first impl for html
* progressing on other components
* fix test
* add tests - run with meta
* release note
* reintroduce patches wrongly deleted
* add patch in test
* fix tika test
* Update haystack/components/converters/azure.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* Update releasenotes/notes/converters-standardize-inputs-ed2ba9c97b762974.yaml
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* simplify test
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
|
2023-12-15 16:41:35 +01:00 |
|
Massimiliano Pippi
|
7c05f37a53
|
remove unit marker (#6450)
|
2023-11-29 19:24:25 +01:00 |
|
Silvano Cerza
|
e6637f5ec2
|
Fix all tests
|
2023-11-24 14:48:43 +01:00 |
|
Massimiliano Pippi
|
8adb8bbab8
|
Remove preview folder in test/
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
|
2023-11-24 11:52:55 +01:00 |
|