Amna Mubashar
9302d3d9f0
feat: Add store_full_path to converters (2/3) ( #8573 )
2024-11-25 15:22:19 +05:00
Corentin Meyer
1c53aae8f0
fix: Tika converter not yielding page break tags (\f
) ( #8082 )
...
* Fix TikaConverter not having \f page tag by using HTML mode of parsing and then parsing the HTML to text using the old Haystack 1.X integration as template.
* Add Reno
* Fix test by making Mock Tika return XML (before parsing)
* refinements and test
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-07-26 20:13:47 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules ( #7675 )
...
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
ZanSara
974d65f30a
feat: support single metadata dictionary in TikaDocumentConverter
( #6698 )
...
* reno
* converter
* test
* comment
2024-01-09 09:49:47 +01:00
sahusiddharth
3d17e6ff76
changed metadata to meta ( #6605 )
2023-12-21 12:39:58 +01:00
Stefano Fiorucci
2f034d3c97
refactor!: Converters - standardize inputs ( #6540 )
...
* standardize converters inputs: first draft
* fix precommit
* fix precommit 2
* fix precommit 3
* add default for optional param
* rm leftover
* install boilerpy in linting workflow
* add boilerpy3 to the core dependencies
* add reno
* remove boilerpy3 installation from test workflow
* fix pylint: import order and unused import
* fix import order
* add release note
* better Tika docstring
* rm boilerpy from linting
* leftover
* md link brackets
* feat: Converters - allow passing `meta` in the `run` method (#6554 )
* first impl for html
* progressing on other components
* fix test
* add tests - run with meta
* release note
* reintroduce patches wrongly deleted
* add patch in test
* fix tika test
* Update haystack/components/converters/azure.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* Update releasenotes/notes/converters-standardize-inputs-ed2ba9c97b762974.yaml
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* simplify test
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-12-15 16:41:35 +01:00
Massimiliano Pippi
7c05f37a53
remove unit marker ( #6450 )
2023-11-29 19:24:25 +01:00
Silvano Cerza
e6637f5ec2
Fix all tests
2023-11-24 14:48:43 +01:00
Massimiliano Pippi
8adb8bbab8
Remove preview folder in test/
...
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:52:55 +01:00