10 Commits

Author SHA1 Message Date
Carlos Fernández
c1c339923f
feat: add DocxToDocument converter (#7838)
* first fucntioning DocxFileToDocument

* fix lazy import message

* add reno

* Add license headder

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* change DocxFileToDocument to DocxToDocument

* Update library install to the maintained version

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* clan try-exvept to only take non haystack errors into account

* Add wanring on docstring of component ignoring page brakes, mark test as skip

* make warnings lazy evaluations

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* make warnings lazy evaluations

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Make warnings lazy evaluated

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Solve f bug

* Get more metadata from docx files

* add 'python-docx' dependency and docs

* Change logging import

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Fix typo

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* remake metadata extraction for docx

* solve bug regarding _get_docx_metadata method

* Update haystack/components/converters/docx.py

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Update haystack/components/converters/docx.py

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Delete unused test

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-06-12 11:58:36 +02:00
Sebastian Husch Lee
2c2c7c9f56
feat: Add PPTXToDocument converter (#7808)
* Add first pass at PPTXToDocument converter

* Add test and update code

* Add doc string

* Update docstrings

* Add release notes

* remove unused imports, add to api docs, update pyproject.toml

* Add a new test

* Add dep so tests can run
2024-06-07 09:43:29 +00:00
Vladimir Blagojevic
988c360b6d
feat: Azure converter updates (#7409)
* Initial commit

* Remove old mock tests

* Fix current_last_page_number calculation

* Carry over unit tests from the other side

* Update pydocs, skip failing tests

* Fix pylint and mypy

* Minor adjustments

* Add release note

* Minor touch ups

* Resolve Document unique id issue by using custom id calculation

* Better hashing, add unit tests

* Small fixes
2024-04-09 09:45:06 +02:00
Vladimir Blagojevic
c3b96392fd
feat: Use all HTMLToDocument extractors until content is extracted (#7452)
* Use all HTMLToDocument extractors until content is extracted

* Add release note

* Minor doc update

* Improvements, unit test fixes

* Add try_others init param, update tests

* Update haystack/components/converters/html.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* PR feedback - Stefano

* Improve reno release note, add  reference

* little fixes

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-04-05 16:02:34 +02:00
Vladimir Blagojevic
d871bbbfbd
feat: Add complex types in OpenAPI support (#7065)
* Add complex types OpenAPI support

* Add release note
---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-02-27 18:11:06 +01:00
Vladimir Blagojevic
cb6389d7a2
feat: Improve OpenAPI integration (#7034)
* Simplify and improve OpenAPIServiceConnector and OpenAPIServiceToFunctions, add unit tests

* Add reno note

* Add flask test dependency

* Initial PR feedback - Julian

* Remove indirection - Silvano

* Remove flask end-to-end tests

* Remove unused import

* Add mixed body unit test

* Update unit test, mock properly
2024-02-22 14:03:50 +01:00
Silvano Cerza
f96eb3847f
refactor: Merge Pipelines definition in core package (#6973)
* Move marshalling functions in core Pipeline

* Move telemetry gathering in core Pipeline

* Move run logic in core Pipeline

* Update root Pipeline import

* Add release notes

* Update Pipeline docs path

* Update releasenotes/notes/merge-pipeline-definitions-1da80e9803e2a8bb.yaml

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-02-12 18:25:28 +01:00
Vladimir Blagojevic
37d9de3c4e
feat: Add service_credentials to OpenAPIServiceConnector run (#6962)
* Add service_credentials to OpenAPIServiceConnector run
* PR feedback Silvano
2024-02-09 16:03:27 +01:00
Silvano Cerza
e6637f5ec2 Fix all tests 2023-11-24 14:48:43 +01:00
Massimiliano Pippi
8adb8bbab8
Remove preview folder in test/
---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:52:55 +01:00