haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-09 22:33:47 +00:00

Author	SHA1	Message	Date
Sebastian Husch Lee	28ad78c73d	feat: Add XLSXToDocument converter (#8522 ) * Add draft of the Excel To Document converter * Add license header * Add release note * Use Union instead of pipe * Add openpyxl as additional dep * Fix zip issue * few updates from Bijay * Update deps * Add markdown test * Adding more example excels and expanding tests * Added more tests * Fix windows test by setting lineterminator * Addressing PR comments * PR comments * Fix linting	2025-01-09 09:03:19 +01:00
Stefano Fiorucci	2bc58d2987	feat: support for tools in `HuggingFaceAPIChatGenerator` (#8661 ) * message conversion function * hfapi w tools * right test file + hf_hub version * release note * feedback	2024-12-19 15:04:37 +01:00
Stefano Fiorucci	96b4a1d2fd	feat: `Tool` dataclass - unified abstraction to represent tools (#8652 ) * draft * del HF token in tests * adaptations * progress * fix type * import sorting * more control on deserialization * release note * improvements * support name field * fix chatpromptbuilder test * port Tool from experimental * release note * docs upd * Update tool.py --------- Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2024-12-18 11:36:44 +00:00
Stefano Fiorucci	2a9a6401d2	chore: pin `openai>=1.56.1` (#8632 ) * pin openai>=1.56.1 * release note	2024-12-12 16:26:38 +01:00
David S. Batista	248dccbdd3	chore: fixing `pylint` issues (#8610 ) * initial import * fixing internal methods * fixing some internal methods * modify _preprocess * fixed internal methods --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com>	2024-12-09 16:53:37 +00:00
Stefano Fiorucci	de7099e560	ci: add job to check imports (#8594 ) * try checking imports * clarify error message * better fmt * do not show complete list of successfully imported packages * refinements * relnote * add missing forward references * better function name * linting * fix linting * Update .github/utils/check_imports.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-11-29 14:00:59 +00:00
Stefano Fiorucci	f085959067	chore: declare `requires-python<3.13` in pyproject (#8547 ) * restrict to python<3.13 * try unpinning dulwich * reintroduce dulwich pin	2024-11-15 09:28:39 +00:00
Silvano Cerza	ebb45d3d1e	Remove ddtrace version pin (#8529 )	2024-11-11 11:21:10 +01:00
Stefano Fiorucci	c7b898994e	build: unpin `numpy` + use Python 3.9 in CI (#8492 ) * try unpinning numpy * try python 3.9 * release note	2024-10-28 12:15:17 +01:00
Silvano Cerza	0157459a7b	Pin ddtrace test dependency to fix tests (#8478 )	2024-10-22 10:19:25 +00:00
Stefano Fiorucci	f6935d1456	ci: add `pip` to `test` dependencies (#8475 ) * add pip to test dependencies * trigger * release note * rm trigger	2024-10-22 08:35:30 +00:00
Stefano Fiorucci	7788bfe558	ci: upgrade Hatch to 1.13.0 and adopt uv as installer (#8313 ) * try uv * upgrade hatch * rm unnecessary specification * release note	2024-10-17 10:32:14 +02:00
Silvano Cerza	29672d4b42	feat: Add `JSONConverter` Component (#8397 ) * Add JSONConverter Component * Handle some corner cases * Add JSONConverter to pydoc config * Add a way to extract all non content fields as metadata * Small fix in docstring * Fix tests * docstrings upd * Update json.py --------- Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2024-09-25 12:34:51 +02:00
Silvano Cerza	4b77ec1b6f	Fix codespell config (#8392 )	2024-09-24 12:00:45 +02:00
Vladimir Blagojevic	badd0594cc	feat: Port NLTKDocumentSplitter from dC to Haystack (#8350 ) * Port NLTKDocumentSplitter from dC to Haystack * Improve pydocs * Use haystack logging * Add NLTKDocumentSplitter to __init__.py * Use haystack logging, rename test classes * Fixing _needs_join return * Linting * PR feedback * More static methods * Increase test coverage * Compile pattern --------- Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>	2024-09-17 13:59:19 +02:00
Silvano Cerza	da49e782e2	chore: Make `arrow` an optional dependency (#8345 ) * Make arrow an optional dependency * Fix imports	2024-09-09 16:09:51 +02:00
Mo Sriha	75955922b9	feat: Add current date in UTC to PromptBuilder (#8233 ) * initial commit * add unit tests * add release notes * update function name	2024-09-09 09:47:03 +02:00
Stefano Fiorucci	25d333bed3	update transformers (#8296 )	2024-08-27 16:04:11 +00:00
Stefano Fiorucci	6b0ee4c193	chore: update test dependency and `LazyImport` block to make compatibility with `sentence-transformers>=3.0.0` explicit (#8295 ) * sentence-transformers-3 update test dep and lazyimport block * clearer release note	2024-08-27 15:51:03 +00:00
Tobias Wochinger	5a3ea75196	docs: document Python 3.11 and 3.12 support (#8159 ) * docs: add Python 3.11 and 3.12 to supported versions * docs: add release notes	2024-08-02 14:46:20 +02:00
Tobias Wochinger	4dde6fbaec	build: unpin structlog (#8071 )	2024-07-24 20:58:34 +02:00
Vladimir Blagojevic	a59de1d7b3	chore: Combined main unblock (#8045 ) * Pin structlog to 24.2.0 due to unit test failures * Remove object init parameter in huggingface_hub unit tests * Use less restrictive structlog pin * Add release note	2024-07-19 10:39:10 +02:00
Vladimir Blagojevic	b3b3f89302	feat: Add haystack-experimental dependency (#7921 ) * Add haystack-experimental dependency * Add reno note	2024-07-08 14:07:15 +02:00
Stefano Fiorucci	d80e01492b	update sentence transformers import error message (#7906 )	2024-06-20 18:15:01 +02:00
Massimiliano Pippi	3a03fce71c	ci: Add code formatting checks (#7882 ) * ruff settings enable ruff format and re-format outdated files feat: `EvaluationRunResult` add parameter to specify columns to keep in the comparative `Dataframe` (#7879) * adding param to explictily state which cols to keep * adding param to explictily state which cols to keep * adding param to explictily state which cols to keep * updating tests * adding release notes * Update haystack/evaluation/eval_run_result.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update releasenotes/notes/add-keep-columns-to-EvalRunResult-comparative-be3e15ce45de3e0b.yaml Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * updating docstring --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> add format-check fail on format and linting failures fix string formatting reformat long lines fix tests fix typing linter pull from main * reformat * lint -> check * lint -> check	2024-06-18 15:52:46 +00:00
Stefano Fiorucci	2413bb3f42	chore: pin numpy<2; tenacity!=8.4.0 (#7876 ) * pin numpy<2 * reno * pin tenacity too	2024-06-17 10:54:02 +02:00
Massimiliano Pippi	324bbc3868	chore: clean up `default` env and add a script to generate release notes. (#7858 ) * clean up default env and add reno script * update contributions guidelines * use test script * format * re-add missing dep * remove black in favour of ruff	2024-06-14 14:57:24 +02:00
Carlos Fernández	c1c339923f	feat: add DocxToDocument converter (#7838 ) * first fucntioning DocxFileToDocument * fix lazy import message * add reno * Add license headder Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * change DocxFileToDocument to DocxToDocument * Update library install to the maintained version Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * clan try-exvept to only take non haystack errors into account * Add wanring on docstring of component ignoring page brakes, mark test as skip * make warnings lazy evaluations Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * make warnings lazy evaluations Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * Make warnings lazy evaluated Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * Solve f bug * Get more metadata from docx files * add 'python-docx' dependency and docs * Change logging import Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * Fix typo Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * remake metadata extraction for docx * solve bug regarding _get_docx_metadata method * Update haystack/components/converters/docx.py Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * Update haystack/components/converters/docx.py Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * Delete unused test --------- Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>	2024-06-12 11:58:36 +02:00
Sebastian Husch Lee	2c2c7c9f56	feat: Add PPTXToDocument converter (#7808 ) * Add first pass at PPTXToDocument converter * Add test and update code * Add doc string * Update docstrings * Add release notes * remove unused imports, add to api docs, update pyproject.toml * Add a new test * Add dep so tests can run	2024-06-07 09:43:29 +00:00
Stefano Fiorucci	bde92fda67	upgrade transformers and reorganize extras (#7815 )	2024-06-06 15:57:18 +02:00
Silvano Cerza	23011c215e	chore: Change trafilatura dependency to use lazy import (#7809 ) * Change trafilatura dependency to use lazy import * Add release notes	2024-06-05 18:04:24 +02:00
Silvano Cerza	fd838fc573	Update indexing and rag default templates to use InMemoryDocumentStore (#7782 )	2024-06-04 12:57:33 +02:00
Silvano Cerza	3dcc21fd73	test: Pipeline run tests rework (#7748 ) * Rework Pipeline.run() tests * Remove test_linear_pipeline.py * Add test for components execution order * Add new pytest-bdd tests dependency * Update README.md * Add function to dinamically add integration marker * Fix marking tests as integration	2024-05-28 15:42:47 +02:00
Stefano Fiorucci	7181f6b7e9	feat: change HTML conversion backend from boilerpy3 to Trafilatura (#7705 ) * change HTML conversion backed to Trafilatura * rm unused var	2024-05-17 10:38:47 +02:00
Guest400123064	cd66a80ba2	perf: enhanced `InMemoryDocumentStore` BM25 query efficiency with incremental indexing (#7549 ) * incorporating better bm25 impl without breaking interface * all three bm25 algos * 1. setting algo post-init not allowed; 2. remove extra underscore for naming consistency; 3. remove unused import * 1. rename attribute name for IDF computation 2. organize document statistics as a dataclass instead of tuple to improve readability * fix score type initialization (int -> float) to pass mypy check * release note included * fixing linting issues and mypy * fixing tests * removing heapq import and cleaning up logging * changing indexing order * adding more tests * increasing tests * removing rank_bm25 from pyproject.toml --------- Co-authored-by: David S. Batista <dsbatista@gmail.com>	2024-05-03 12:10:15 +00:00
Vladimir Blagojevic	5f813373eb	chore: Update huggingface_hub classes used after library upgrade (#7631 ) * Update huggingface_hub classes used after library upgrade * Fix chat tests * Update lazy import guard and other references to huggingface_hub>=0.23.0 * In huggingface_hub 0.23.0 TextGenerationOutput property details is now optional * More fixes * Add reno note	2024-05-03 10:14:54 +02:00
Mo	2e35f13085	feat: add converter based on pdfminer (#7607 ) * Initial commit pdfminer converter * Revert back naming of argument all_text per pdfminer documentation * Add the component decorator * Add release notes * Reformat code with black * Remove LTPage and comments * Update dependencies in pyproject.toml * Added some tests and incorporated reference doc in docstring * Added some tests and incorporated reference doc in docstring	2024-05-02 10:36:54 +02:00
David S. Batista	8d04e530da	test: end2end evaluation tests (#7601 ) * initial import * wip * cleaning up tests * fixing tests * adding context relevance * reverting some wrong changes to due PyCharm error in refactoring * building eval pipeline only once * handling mypy issues	2024-04-26 14:07:05 +00:00
David S. Batista	958f1eb3a3	doc: adding `docstring` linting based on `ruff` (#7463 ) * wip: docstrings linting * set ruff rules	2024-04-23 18:43:09 +02:00
Massimiliano Pippi	5d0ccfe7d4	fix hatch scripts (#7546 )	2024-04-12 18:04:18 +02:00
Massimiliano Pippi	e90ffafb47	chore: forward hatch command args to pytest (#7537 )	2024-04-11 21:30:34 +02:00
Massimiliano Pippi	2dca53f69b	chore: set linting parameters to the minimum (#7501 ) * set line-length to the minimum * add more defaults --------- Co-authored-by: David S. Batista <dsbatista@gmail.com>	2024-04-09 08:56:16 +02:00
Stefano Fiorucci	e26ee0f1db	refactor!: make TGI generators compatible with `huggingface_hub>=0.22.0` (#7425 ) * progress * progress * better lazy imports * fixes * reno	2024-03-26 16:10:06 +01:00
Stefano Fiorucci	19d3f39e75	ci: pin huggingface_hub in tests dependencies (#7417 ) * pin huggingface_hub in tests dependencies * Update pyproject.toml	2024-03-25 18:52:02 +01:00
Stefano Fiorucci	e793c718b6	chore: Upgrade transformers to 4.38.2 in test environment (#7363 ) * upgrade transformers to 4.38.2 in test environment * add pyproject to files to check in test workflow	2024-03-15 10:06:28 +01:00
Stefano Fiorucci	abda78c122	unpin OpenAI and fix problem with mock (#7364 )	2024-03-15 08:32:28 +01:00
Vladimir Blagojevic	5b4f9f1cda	Pin openai to latest working version (#7359 )	2024-03-14 10:47:28 +01:00
Tobias Wochinger	655d4a1a8d	test: test for missing dependencies (#7278 ) * tests: import test for missing libraries * build: add missing dependencies * refactor: use glob instead of tree walk * test: extract constants + more documentation	2024-03-05 12:14:10 +01:00
Stefano Fiorucci	721691c036	replace flaky with pytest-rerunfailures (#7298 )	2024-03-04 12:26:40 +01:00
Stefano Fiorucci	727794cb70	pin pytest (#7295 )	2024-03-04 10:14:39 +01:00

1 2 3 4 5

238 Commits