haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-18 14:31:49 +00:00

Author	SHA1	Message	Date
David S. Batista	5af2888e23	fix: `PDFMinerToDocument` convert function - adding double new lines between each `container_text` so that passages can be detected. (#8729 ) * initial import * adding double new lines between container_texts so that passages can be detected * reducing type specification to avoid import error * adding release notes * renaming variable	2025-01-17 13:01:16 +00:00
Julian Risch	642fa60cdf	fix: PDFMinerToDocument initializes documents with content and meta (#8708 ) * fix: PDFMinerToDocument initializes documents with content and meta * add release note * Apply suggestions from code review Co-authored-by: David S. Batista <dsbatista@gmail.com> --------- Co-authored-by: David S. Batista <dsbatista@gmail.com>	2025-01-13 10:12:06 +00:00
Amna Mubashar	9302d3d9f0	feat: Add store_full_path to converters (2/3) (#8573 )	2024-11-25 15:22:19 +05:00
Sebastian Husch Lee	911f3523ab	feat: Increase logging transparency for empty Documents during conversion (#8509 ) * Add log lines for PDF conversion and make skipping more explicit in DocumentSplitter * Add logging statement for PDFMinerToDocument as well * Add tests * Remove unused line * Remove unused line * add reno * Add in PDF file * Update checks in PDF converters and add tests for document splitter * Revert * Remove line * Fix comment * Make mypy happy * Make mypy happy	2024-11-04 09:26:57 +01:00
Massimiliano Pippi	10c675d534	chore: add license header to all modules (#7675 ) * add license header to modules * check license header at linting time	2024-05-09 13:40:36 +00:00
Mo	2e35f13085	feat: add converter based on pdfminer (#7607 ) * Initial commit pdfminer converter * Revert back naming of argument all_text per pdfminer documentation * Add the component decorator * Add release notes * Reformat code with black * Remove LTPage and comments * Update dependencies in pyproject.toml * Added some tests and incorporated reference doc in docstring * Added some tests and incorporated reference doc in docstring	2024-05-02 10:36:54 +02:00

6 Commits