mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-07-21 16:04:09 +00:00

* Add page number to Documents coming from PDFConverters and PreProcessor * Fix mypy * Update API Docs * Update API Docs * Remove unused imports * Generate JSON schema * Generate JSON schema * Make test variable shorter * Make regex a separate function * Move counting of page breaks to a function * Generate JSON schema * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update API Documentation * Don't create instance for testing staticmethod * Update haystack/nodes/preprocessor/preprocessor.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
21 lines
575 B
YAML
21 lines
575 B
YAML
loaders:
|
|
- type: python
|
|
search_path: [../../../../haystack/nodes/file_converter]
|
|
modules: ['base', 'docx', 'image', 'markdown', 'pdf', 'parsr', 'azure', 'tika', 'txt']
|
|
ignore_when_discovered: ['__init__']
|
|
processors:
|
|
- type: filter
|
|
expression:
|
|
documented_only: true
|
|
do_not_filter_modules: false
|
|
skip_empty_modules: true
|
|
- type: smart
|
|
- type: crossref
|
|
renderer:
|
|
type: markdown
|
|
descriptive_class_title: false
|
|
descriptive_module_title: true
|
|
add_method_class_prefix: true
|
|
add_member_class_prefix: false
|
|
filename: file_converter.md
|