haystack/docs/_src/api/pydoc/file-converters.yml
bogdankostic 5c3bfad078
feat: Add page number to Documents coming from PDFConverters and PreProcessor (#2932)
* Add page number to Documents coming from PDFConverters and PreProcessor

* Fix mypy

* Update API Docs

* Update API Docs

* Remove unused imports

* Generate JSON schema

* Generate JSON schema

* Make test variable shorter

* Make regex a separate function

* Move counting of page breaks to a function

* Generate JSON schema

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update API Documentation

* Don't create instance for testing staticmethod

* Update haystack/nodes/preprocessor/preprocessor.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-09 15:55:27 +02:00

21 lines
575 B
YAML

loaders:
- type: python
search_path: [../../../../haystack/nodes/file_converter]
modules: ['base', 'docx', 'image', 'markdown', 'pdf', 'parsr', 'azure', 'tika', 'txt']
ignore_when_discovered: ['__init__']
processors:
- type: filter
expression:
documented_only: true
do_not_filter_modules: false
skip_empty_modules: true
- type: smart
- type: crossref
renderer:
type: markdown
descriptive_class_title: false
descriptive_module_title: true
add_method_class_prefix: true
add_member_class_prefix: false
filename: file_converter.md