haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-03 19:29:32 +00:00

Author	SHA1	Message	Date
bogdankostic	60224412bc	feat: Add headline extraction to `ParsrConverter` (#3488 ) * Add headline extraction to ParsrConverter * Add sample PDF file * Add test * Use extract_headlines if set in convert method * Integrate PR feedback	2022-10-31 19:00:02 +01:00
Daniel Bichuetti	df1f4205b6	feat: add public layout-base extraction support on PDFToTextConverter (#3137 ) * feat(PDFToTextConverter): add option to get text in physical layout order * test: add physical layout extraction test to PDFToTextConverter * refactor: change layout parameter attribution places * docs: manually trigger pre-commits * docs: generate new docs to comply with pydoc-markdown style	2022-09-13 16:55:21 +02:00
Daniel Augustus Bichuetti Silva	1706729e26	Prevent `PDFToTextConverter` from failing on PDFs with spaces in their names (#2786 ) * Change split logic to list * Fix wrong parameter for run * Fix mypy error * Fix layout/raw parameter * Add test for filename with whitespaces on PDFToText * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-07-11 13:30:33 +02:00
Tanay Soni	ef9e4f4467	Add PDF text extraction (#109 )	2020-06-08 11:07:19 +02:00