mirror of
https://github.com/docling-project/docling.git
synced 2025-06-27 05:20:05 +00:00
chore: update README (#13)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
parent
f09ffcc8f4
commit
28d1c746a6
13
README.md
13
README.md
@ -1,5 +1,7 @@
|
||||
<p align="center">
|
||||
<a href="https://github.com/ds4sd/docling"> <img loading="lazy" alt="Docling" src="https://github.com/DS4SD/docling/raw/main/logo.png" width="150" />
|
||||
<a href="https://github.com/ds4sd/docling">
|
||||
<img loading="lazy" alt="Docling" src="https://github.com/DS4SD/docling/raw/main/logo.png" width="150" />
|
||||
</a>
|
||||
</p>
|
||||
|
||||
# Docling
|
||||
@ -11,7 +13,7 @@
|
||||
[](https://pycqa.github.io/isort/)
|
||||
[](https://pydantic.dev)
|
||||
[](https://github.com/pre-commit/pre-commit)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
|
||||
Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.
|
||||
|
||||
@ -49,7 +51,7 @@ The output of the above command will be written to `./scratch`.
|
||||
|
||||
### Adjust pipeline features
|
||||
|
||||
**Control pipeline options**
|
||||
#### Control pipeline options
|
||||
|
||||
You can control if table structure recognition or OCR should be performed by arguments passed to `DocumentConverter`:
|
||||
```python
|
||||
@ -62,16 +64,15 @@ doc_converter = DocumentConverter(
|
||||
)
|
||||
```
|
||||
|
||||
**Control table extraction options**
|
||||
#### Control table extraction options
|
||||
|
||||
You can control if table structure recognition should map the recognized structure back to PDF cells (default) or use text cells from the structure prediction itself.
|
||||
This can improve output quality if you find that multiple columns in extracted tables are erroneously merged into one.
|
||||
|
||||
|
||||
```python
|
||||
|
||||
pipeline_options = PipelineOptions(do_table_structure=True)
|
||||
pipeline_options.table_structure_options.do_cell_matching = False # Uses text cells predicted from table structure model
|
||||
pipeline_options.table_structure_options.do_cell_matching = False # uses text cells predicted from table structure model
|
||||
|
||||
doc_converter = DocumentConverter(
|
||||
artifacts_path=artifacts_path,
|
||||
|
Loading…
x
Reference in New Issue
Block a user