mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-08-18 13:45:45 +00:00
doc: add pdf extra note (#1165)
This commit is contained in:
parent
4114022d9d
commit
ab7fafcb41
@ -1,4 +1,4 @@
|
|||||||
## 0.10.5-dev3
|
## 0.10.5-dev4
|
||||||
|
|
||||||
### Enhancements
|
### Enhancements
|
||||||
* Create new CI Pipelines
|
* Create new CI Pipelines
|
||||||
@ -7,6 +7,7 @@
|
|||||||
* `partition` raises and error and tells the user to install the appropriate extra if a filetype
|
* `partition` raises and error and tells the user to install the appropriate extra if a filetype
|
||||||
is detected that is missing dependencies.
|
is detected that is missing dependencies.
|
||||||
* Add custom errors to ingest
|
* Add custom errors to ingest
|
||||||
|
* Add notes on extra installs to docs
|
||||||
|
|
||||||
|
|
||||||
## 0.10.3
|
## 0.10.3
|
||||||
|
@ -58,9 +58,7 @@ The example documents in this section come from the
|
|||||||
directory in the ``unstructured`` repo.
|
directory in the ``unstructured`` repo.
|
||||||
|
|
||||||
Before running the code in this make sure you've installed the ``unstructured`` library
|
Before running the code in this make sure you've installed the ``unstructured`` library
|
||||||
and all dependencies using the instructions in the **Quick Start** section.
|
and all dependencies using the instructions in the `Quick Start <https://unstructured-io.github.io/unstructured/installing.html#quick-start>`_ section.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Partitioning a document
|
Partitioning a document
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
@ -164,7 +162,7 @@ of the table will be available in the element metadata under ``element.metadata.
|
|||||||
table extraction is available, the ``partition`` function will extract tables automatically if they are present.
|
table extraction is available, the ``partition`` function will extract tables automatically if they are present.
|
||||||
For PDFs and images, table extraction requires a relatively expensive call to a table recognition model, and so for those
|
For PDFs and images, table extraction requires a relatively expensive call to a table recognition model, and so for those
|
||||||
document types table extraction is an option you need to enable. If you would like to extract tables for PDFs or images,
|
document types table extraction is an option you need to enable. If you would like to extract tables for PDFs or images,
|
||||||
pass in ``infer_table_structured=True``. Here is an example:
|
pass in ``infer_table_structured=True``. Here is an example (Note: this example requires the ``pdf`` extra. This can be installed with ``pip install "unstructured[pdf]"``):
|
||||||
|
|
||||||
.. code:: python
|
.. code:: python
|
||||||
|
|
||||||
@ -257,7 +255,7 @@ looks like the following:
|
|||||||
from unstructured.partition.auto import partition
|
from unstructured.partition.auto import partition
|
||||||
from unstructured.staging.base import elements_to_json
|
from unstructured.staging.base import elements_to_json
|
||||||
|
|
||||||
input_filename = "example-10k.html"
|
input_filename = "example-docs/example-10k.html"
|
||||||
output_filename = "outputs.json"
|
output_filename = "outputs.json"
|
||||||
|
|
||||||
elements = partition(filename=input_filename)
|
elements = partition(filename=input_filename)
|
||||||
|
@ -1 +1 @@
|
|||||||
__version__ = "0.10.5-dev3" # pragma: no cover
|
__version__ = "0.10.5-dev4" # pragma: no cover
|
||||||
|
Loading…
x
Reference in New Issue
Block a user