Sebastian Husch Lee
85258f0654
fix: Fix types and formatting pipeline test_run.py ( #9575 )
...
* Fix types in test_run.py
* Get test_run.py to pass fmt-check
* Add test_run to mypy checks
* Update test folder to pass ruff linting
* Fix merge
* Fix HF tests
* Fix hf test
* Try to fix tests
* Another attempt
* minor fix
* fix SentenceTransformersDiversityRanker
* skip integrations tests due to model unavailable on HF inference
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-07-03 09:49:09 +02:00
David S. Batista
da60156174
chore: removing unused imports from tests ( #9446 )
2025-05-26 16:22:51 +00:00
Stefano Fiorucci
f3c44be904
refactor!: remove dataframe
field from Document
and ExtractedTableAnswer
; make pandas
optional ( #8906 )
...
* remove dataframe
* release note
* small fix
* group imports
* Update pyproject.toml
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* Update pyproject.toml
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* address feedback
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2025-03-04 11:06:07 +00:00
Michele Pangrazzi
21d53d0ec6
update default value of 'store_full_path' to False in converters ( #8619 )
2024-12-10 16:03:38 +01:00
Amna Mubashar
21906d0558
feat: Add store_full_path
to converters (1/3) ( #8566 )
...
* Add store_full_path param to 3 converters
2024-11-22 13:55:08 +01:00
Stefano Fiorucci
3d1ad10385
fix html test ( #8127 )
2024-07-31 10:59:53 +02:00
Stefano Fiorucci
7181f6b7e9
feat: change HTML conversion backend from boilerpy3 to Trafilatura ( #7705 )
...
* change HTML conversion backed to Trafilatura
* rm unused var
2024-05-17 10:38:47 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules ( #7675 )
...
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Vladimir Blagojevic
c3b96392fd
feat: Use all HTMLToDocument extractors until content is extracted ( #7452 )
...
* Use all HTMLToDocument extractors until content is extracted
* Add release note
* Minor doc update
* Improvements, unit test fixes
* Add try_others init param, update tests
* Update haystack/components/converters/html.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* PR feedback - Stefano
* Improve reno release note, add reference
* little fixes
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-04-05 16:02:34 +02:00
Madeesh Kannan
5d66d040cc
feat: Add serde methods to HTMLToDocument
( #6758 )
2024-01-18 10:02:01 +01:00
ZanSara
ff55985e2d
feat: support single metadata dictionary in HTMLToDocument
( #6613 )
...
* support single metadata in HTMLToDocument
* reno
* docstring
2023-12-21 16:45:31 +01:00
sahusiddharth
3d17e6ff76
changed metadata to meta ( #6605 )
2023-12-21 12:39:58 +01:00
Stefano Fiorucci
94cfe5d9ae
feat!: HTMLToDocument
- allow choosing the boilerpy3 extractor ( #6582 )
...
* allow extractor customizability
* release note
* typo
2023-12-19 10:52:12 +01:00
Stefano Fiorucci
2f034d3c97
refactor!: Converters - standardize inputs ( #6540 )
...
* standardize converters inputs: first draft
* fix precommit
* fix precommit 2
* fix precommit 3
* add default for optional param
* rm leftover
* install boilerpy in linting workflow
* add boilerpy3 to the core dependencies
* add reno
* remove boilerpy3 installation from test workflow
* fix pylint: import order and unused import
* fix import order
* add release note
* better Tika docstring
* rm boilerpy from linting
* leftover
* md link brackets
* feat: Converters - allow passing `meta` in the `run` method (#6554 )
* first impl for html
* progressing on other components
* fix test
* add tests - run with meta
* release note
* reintroduce patches wrongly deleted
* add patch in test
* fix tika test
* Update haystack/components/converters/azure.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* Update releasenotes/notes/converters-standardize-inputs-ed2ba9c97b762974.yaml
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* simplify test
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-12-15 16:41:35 +01:00
Massimiliano Pippi
7c05f37a53
remove unit marker ( #6450 )
2023-11-29 19:24:25 +01:00
Silvano Cerza
e6637f5ec2
Fix all tests
2023-11-24 14:48:43 +01:00
Massimiliano Pippi
8adb8bbab8
Remove preview folder in test/
...
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:52:55 +01:00