216 Commits

Author SHA1 Message Date
Vladimir Blagojevic
b3b3f89302
feat: Add haystack-experimental dependency (#7921)
* Add haystack-experimental dependency

* Add reno note
2024-07-08 14:07:15 +02:00
Stefano Fiorucci
d80e01492b
update sentence transformers import error message (#7906) 2024-06-20 18:15:01 +02:00
Massimiliano Pippi
3a03fce71c
ci: Add code formatting checks (#7882)
* ruff settings

enable ruff format and re-format outdated files

feat: `EvaluationRunResult` add parameter to specify columns to keep in the comparative `Dataframe`  (#7879)

* adding param to explictily state which cols to keep

* adding param to explictily state which cols to keep

* adding param to explictily state which cols to keep

* updating tests

* adding release notes

* Update haystack/evaluation/eval_run_result.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Update releasenotes/notes/add-keep-columns-to-EvalRunResult-comparative-be3e15ce45de3e0b.yaml

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* updating docstring

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

add format-check

fail on format and linting failures

fix string formatting

reformat long lines

fix tests

fix typing

linter

pull from main

* reformat

* lint -> check

* lint -> check
2024-06-18 15:52:46 +00:00
Stefano Fiorucci
2413bb3f42
chore: pin numpy<2; tenacity!=8.4.0 (#7876)
* pin numpy<2

* reno

* pin tenacity too
2024-06-17 10:54:02 +02:00
Massimiliano Pippi
324bbc3868
chore: clean up default env and add a script to generate release notes. (#7858)
* clean up default env and add reno script

* update contributions guidelines

* use test script

* format

* re-add missing dep

* remove black in favour of ruff
2024-06-14 14:57:24 +02:00
Carlos Fernández
c1c339923f
feat: add DocxToDocument converter (#7838)
* first fucntioning DocxFileToDocument

* fix lazy import message

* add reno

* Add license headder

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* change DocxFileToDocument to DocxToDocument

* Update library install to the maintained version

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* clan try-exvept to only take non haystack errors into account

* Add wanring on docstring of component ignoring page brakes, mark test as skip

* make warnings lazy evaluations

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* make warnings lazy evaluations

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Make warnings lazy evaluated

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Solve f bug

* Get more metadata from docx files

* add 'python-docx' dependency and docs

* Change logging import

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Fix typo

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* remake metadata extraction for docx

* solve bug regarding _get_docx_metadata method

* Update haystack/components/converters/docx.py

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Update haystack/components/converters/docx.py

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Delete unused test

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-06-12 11:58:36 +02:00
Sebastian Husch Lee
2c2c7c9f56
feat: Add PPTXToDocument converter (#7808)
* Add first pass at PPTXToDocument converter

* Add test and update code

* Add doc string

* Update docstrings

* Add release notes

* remove unused imports, add to api docs, update pyproject.toml

* Add a new test

* Add dep so tests can run
2024-06-07 09:43:29 +00:00
Stefano Fiorucci
bde92fda67
upgrade transformers and reorganize extras (#7815) 2024-06-06 15:57:18 +02:00
Silvano Cerza
23011c215e
chore: Change trafilatura dependency to use lazy import (#7809)
* Change trafilatura dependency to use lazy import

* Add release notes
2024-06-05 18:04:24 +02:00
Silvano Cerza
fd838fc573
Update indexing and rag default templates to use InMemoryDocumentStore (#7782) 2024-06-04 12:57:33 +02:00
Silvano Cerza
3dcc21fd73
test: Pipeline run tests rework (#7748)
* Rework Pipeline.run() tests

* Remove test_linear_pipeline.py

* Add test for components execution order

* Add new pytest-bdd tests dependency

* Update README.md

* Add function to dinamically add integration marker

* Fix marking tests as integration
2024-05-28 15:42:47 +02:00
Stefano Fiorucci
7181f6b7e9
feat: change HTML conversion backend from boilerpy3 to Trafilatura (#7705)
* change HTML conversion backed to Trafilatura

* rm unused var
2024-05-17 10:38:47 +02:00
Guest400123064
cd66a80ba2
perf: enhanced InMemoryDocumentStore BM25 query efficiency with incremental indexing (#7549)
* incorporating better bm25 impl without breaking interface

* all three bm25 algos

* 1. setting algo post-init not allowed; 2. remove extra underscore for naming consistency; 3. remove unused import

* 1. rename attribute name for IDF computation 2. organize document statistics as a dataclass instead of tuple to improve readability

* fix score type initialization (int -> float) to pass mypy check

* release note included

* fixing linting issues and mypy

* fixing tests

* removing heapq import and cleaning up logging

* changing indexing order

* adding more tests

* increasing tests

* removing rank_bm25 from pyproject.toml

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-05-03 12:10:15 +00:00
Vladimir Blagojevic
5f813373eb
chore: Update huggingface_hub classes used after library upgrade (#7631)
* Update huggingface_hub classes used after library upgrade

* Fix chat tests

* Update lazy import guard and other references to huggingface_hub>=0.23.0

* In huggingface_hub 0.23.0 TextGenerationOutput property details is now optional

* More fixes

* Add reno note
2024-05-03 10:14:54 +02:00
Mo
2e35f13085
feat: add converter based on pdfminer (#7607)
* Initial commit pdfminer converter

* Revert back naming of argument all_text per pdfminer documentation

* Add the component decorator

* Add release notes

* Reformat code with black

* Remove LTPage and comments

* Update dependencies in pyproject.toml

* Added some tests and incorporated reference doc in docstring

* Added some tests and incorporated reference doc in docstring
2024-05-02 10:36:54 +02:00
David S. Batista
8d04e530da
test: end2end evaluation tests (#7601)
* initial import

* wip

* cleaning up tests

* fixing tests

* adding context relevance

* reverting some wrong changes to due PyCharm error in refactoring

* building eval pipeline only once

* handling mypy issues
2024-04-26 14:07:05 +00:00
David S. Batista
958f1eb3a3
doc: adding docstring linting based on ruff (#7463)
* wip: docstrings linting

* set ruff rules
2024-04-23 18:43:09 +02:00
Massimiliano Pippi
5d0ccfe7d4
fix hatch scripts (#7546) 2024-04-12 18:04:18 +02:00
Massimiliano Pippi
e90ffafb47
chore: forward hatch command args to pytest (#7537) 2024-04-11 21:30:34 +02:00
Massimiliano Pippi
2dca53f69b
chore: set linting parameters to the minimum (#7501)
* set line-length to the minimum

* add more defaults

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-04-09 08:56:16 +02:00
Stefano Fiorucci
e26ee0f1db
refactor!: make TGI generators compatible with huggingface_hub>=0.22.0 (#7425)
* progress

* progress

* better lazy imports

* fixes

* reno
2024-03-26 16:10:06 +01:00
Stefano Fiorucci
19d3f39e75
ci: pin huggingface_hub in tests dependencies (#7417)
* pin huggingface_hub in tests dependencies

* Update pyproject.toml
2024-03-25 18:52:02 +01:00
Stefano Fiorucci
e793c718b6
chore: Upgrade transformers to 4.38.2 in test environment (#7363)
* upgrade transformers to 4.38.2 in test environment

* add pyproject to files to check in test workflow
2024-03-15 10:06:28 +01:00
Stefano Fiorucci
abda78c122
unpin OpenAI and fix problem with mock (#7364) 2024-03-15 08:32:28 +01:00
Vladimir Blagojevic
5b4f9f1cda
Pin openai to latest working version (#7359) 2024-03-14 10:47:28 +01:00
Tobias Wochinger
655d4a1a8d
test: test for missing dependencies (#7278)
* tests: import test for missing libraries

* build: add missing dependencies

* refactor: use glob instead of tree walk

* test: extract constants + more documentation
2024-03-05 12:14:10 +01:00
Stefano Fiorucci
721691c036
replace flaky with pytest-rerunfailures (#7298) 2024-03-04 12:26:40 +01:00
Stefano Fiorucci
727794cb70
pin pytest (#7295) 2024-03-04 10:14:39 +01:00
Massimiliano Pippi
221bfb012c
fix: Update pyproject.toml (#7281)
* Update pyproject.toml

* make tests run on templates changes

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-03-01 12:40:24 +01:00
Tobias Wochinger
9fe2aae758
ci: add ruff to CI + bring config up to date (#7266)
* ci: add ruff to CI

* chore: fix ruff issues

* ci: fix ruff deprecation warnings

* ci: add ruff as dependency
2024-03-01 11:08:51 +01:00
Tobias Wochinger
fe0ac5c4a2
chore: enforce kwarg logging (#7207)
* chore: add logger which eases logging of extras

* chore: start migrating to key value

* fix: import fixes

* tests: temporarily comment out breaking test

* refactor: move to kwarg based logging

* style: fix import order

* chore: implement self-review comments

* test: drop failing test

* chore: fix more import orders

* docs: add changelog

* tests: fix broken tests

* chore: fix getting the frames

* chore: add comment

* chore: cleanup

* chore: adapt remaining `%s` usages
2024-02-29 14:31:20 +01:00
Tobias Wochinger
f22d49944d
docs: review and normalize haystack.components.websearch (#7236)
* docs: review and normalize `haystack.components.websearch`

* fix: use correct type annotations

* refactor: use type from protocol

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Revert "refactor: use type from protocol"

This reverts commit 23d6f45cd763c39b98be1bff03639a90f2a01fac.

* docs: refactor according to comments

* build: correctly pin to 4.7

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-02-28 16:43:08 +01:00
Tobias Wochinger
c2a9528595
build: pin typing-extensions (#7245)
https://community.openai.com/t/error-while-importing-openai-from-open-import-openai/578166/26
2024-02-28 14:34:41 +01:00
Silvano Cerza
61eb143905
Fix delete outdated docs job in readme_sync.yml (#7241) 2024-02-28 12:41:10 +01:00
Tobias Wochinger
2a591280ab
feat: implement support for structured logging (#7126)
* feat: implement support for structured logging

* docs: add release notes

* style: add explanatory comment

* chore: test + import fixes

* tests: fix windows tests
2024-02-27 09:15:01 +01:00
Tobias Wochinger
ba49905eff
ci: unify dependency management + hatch scripts (#7079)
* ci: unify dependency management + hatch scripts

* ci: migrate readme sync

* build: migrate snippets

* ci: pin hatch

* ci: make Python version more explicit + quote

* ci: add scripts with parameters to hatch

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-02-26 15:40:10 +01:00
Silvano Cerza
289aa44aec
Add flaky as dev dependency (#6924) 2024-02-06 10:48:39 +01:00
Stefano Fiorucci
474cf440ee
pin openai>=1.1.0 (#6657) 2023-12-28 17:10:51 +01:00
Vladimir Blagojevic
4d08be0c2a
feat: Update OpenAI Python Client in Haystack 2.x (#6584)
* Update openai python client

* Add release note

* Consolidate multiple mock_chat_completion into one

* Ensure all components have api_base_url, organization params

* Update tests

* Enable function calling

* Oversight

* Minor fixes, add streaming test mocks

* Apply suggestions from code review

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* metadata -> meta

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-12-21 16:21:24 +01:00
Massimiliano Pippi
00fed32024
build: depend on haystack_bm25 instead of rank_bm25 (#6578)
* use the forked package

* switch package dependency

* relnote

* fix package name
2023-12-18 10:47:15 +01:00
Stefano Fiorucci
cf47abdff5
chore: simplify the management of test dependencies (#6559)
* remove audio dep group

* extract dependencies

* beautify

* rm one step
2023-12-15 16:40:41 +01:00
Massimiliano Pippi
bc45170f4e
chore: add boilerpy3 to the core dependencies (#6544)
* add boilerpy3 to the core dependencies

* remove boilerpy3 installation from test workflow

* fix pylint: import order and unused import

* fix import order

* add release note

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-12-14 11:53:38 +01:00
Silvano Cerza
82fe80ce68
Remove old Pylint plugin (#6527) 2023-12-12 09:59:01 +01:00
Massimiliano Pippi
00e1dd6eb8
chore: rearrange the core package, move tests and clean up (#6427)
* rearrange code

* fix tests

* relnote

* merge test modules

* remove extra

* rearrange draw tests

* forgot

* remove unused import
2023-11-28 09:58:56 +01:00
Silvano Cerza
373e1d6172
ci: Fix mypy failures in CI (#6429)
* Fix mypy failures in CI

* Trigger for testing

* Revert "Trigger for testing"

This reverts commit e5d9246df805b3bf2aa845b7f737610cf779a7ad.
2023-11-27 19:09:28 +01:00
Massimiliano Pippi
bbb6025e89
update package name 2023-11-24 12:14:43 +01:00
Massimiliano Pippi
ea1e3f588b
Update dependencies list
---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 12:09:47 +01:00
jlonge4
c44e2cf49b
feat: add microsoft pptx file converter (#6399)
* Create pptx.py

* feat: pptx converter import __init__.py

* feat: add pptx import __init__.py

* feat: add python-pptx dependency

* feat: add sample pptx for testing

* feat: add pptx file-converter test

* feat: release note pptx-file-converter-3e494d2747637eb2.yaml

* feat: Update releasenotes/notes/pptx-file-converter-3e494d2747637eb2.yaml

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

* feat: refactor haystack/nodes/file_converter/pptx.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

* fix imports

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-23 16:46:41 +01:00
Silvano Cerza
604b177788
chore: Remove pydoc-markdown from dev dependencies (#6398)
* Remove pydoc-markdown from dev dependencies

* Remove fastapi pin in rest_api
2023-11-23 15:59:41 +01:00
Vladimir Blagojevic
e04a1f16bb
feat: Add DynamicPromptBuilder to Haystack 2.x (#6328)
* Add DynamicPromptBuilder

* Improve pydocs, add unit tests

* Add release note

* Make expected_runtime_variables optional

* Add pydocs usage example

* Add more pydocs

* Remove test markers

* Update type in unit test

* Update after canals upgrade

* add to api ref

* docstrings updates

* Update test/preview/components/builders/test_dynamic_prompt_builder.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/preview/components/builders/dynamic_prompt_builder.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Deparametrize init test

* Rename expected_runtime_variables to runtime_variables

* Rephrase docstring so meaning is clearer

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-23 11:41:57 +01:00