12 Commits

Author SHA1 Message Date
Madeesh Kannan
8faa3fa465
Revert "fix: make PyPDF backward compatible (#7996)" (#8014)
This reverts commit 58b48e36eb56a896365133ab4a9d8e327989948c.
2024-07-11 13:06:08 +00:00
Tobias Wochinger
58b48e36eb
fix: make PyPDF backward compatible (#7996)
* fix: make PyPDF backward compatible

* Add release note

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-09 10:08:37 +02:00
Stefano Fiorucci
c51f8ffb86
PyPDFToDocument: remove deprecated converter_name and CONVERTERS_REGISTRY (#7910) 2024-06-21 16:52:03 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules (#7675)
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Stefano Fiorucci
6925e3a2e1
refactor!: Improve PyPDFToDocument (#7362)
* first draft

* rm kwargs from protocol

* Simplify

* no breaking changes

* reno

* one more test of the deprecated registry
2024-03-26 10:09:29 +01:00
Sebastian Husch Lee
c0b67432e4
feat: Add page breaks to default PDF to Document converter (#6755)
* Speedup tests for PyPDFToDocument

* Added unit test and removed skipping of empty pages

* add release note

* Add back some integration marks
2024-01-18 08:54:59 +01:00
ZanSara
c0f1dab454
feat: support single metadata dictionary in PyPDFToDocument (#6615)
* support single metadata dict in pypdf2document

* improve tests

* tests

* remove line
2023-12-22 14:13:11 +01:00
sahusiddharth
3d17e6ff76
changed metadata to meta (#6605) 2023-12-21 12:39:58 +01:00
Stefano Fiorucci
2f034d3c97
refactor!: Converters - standardize inputs (#6540)
* standardize converters inputs: first draft

* fix precommit

* fix precommit 2

* fix precommit 3

* add default for optional param

* rm leftover

* install boilerpy in linting workflow

* add boilerpy3 to the core dependencies

* add reno

* remove boilerpy3 installation from test workflow

* fix pylint: import order and unused import

* fix import order

* add release note

* better Tika docstring

* rm boilerpy from linting

* leftover

* md link brackets

* feat: Converters - allow passing `meta` in the `run` method (#6554)

* first impl for html

* progressing on other components

* fix test

* add tests - run with meta

* release note

* reintroduce patches wrongly deleted

* add patch in test

* fix tika test

* Update haystack/components/converters/azure.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update releasenotes/notes/converters-standardize-inputs-ed2ba9c97b762974.yaml

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* simplify test

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-12-15 16:41:35 +01:00
Massimiliano Pippi
7c05f37a53
remove unit marker (#6450) 2023-11-29 19:24:25 +01:00
Silvano Cerza
e6637f5ec2 Fix all tests 2023-11-24 14:48:43 +01:00
Massimiliano Pippi
8adb8bbab8
Remove preview folder in test/
---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:52:55 +01:00