doc: add pdf extra note (#1165)

This commit is contained in:
ryannikolaidis 2023-08-22 11:20:26 -07:00 committed by GitHub
parent 4114022d9d
commit ab7fafcb41
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 6 additions and 7 deletions

View File

@ -1,4 +1,4 @@
## 0.10.5-dev3
## 0.10.5-dev4
### Enhancements
* Create new CI Pipelines
@ -7,6 +7,7 @@
* `partition` raises and error and tells the user to install the appropriate extra if a filetype
is detected that is missing dependencies.
* Add custom errors to ingest
* Add notes on extra installs to docs
## 0.10.3

View File

@ -58,9 +58,7 @@ The example documents in this section come from the
directory in the ``unstructured`` repo.
Before running the code in this make sure you've installed the ``unstructured`` library
and all dependencies using the instructions in the **Quick Start** section.
and all dependencies using the instructions in the `Quick Start <https://unstructured-io.github.io/unstructured/installing.html#quick-start>`_ section.
Partitioning a document
~~~~~~~~~~~~~~~~~~~~~~~
@ -164,7 +162,7 @@ of the table will be available in the element metadata under ``element.metadata.
table extraction is available, the ``partition`` function will extract tables automatically if they are present.
For PDFs and images, table extraction requires a relatively expensive call to a table recognition model, and so for those
document types table extraction is an option you need to enable. If you would like to extract tables for PDFs or images,
pass in ``infer_table_structured=True``. Here is an example:
pass in ``infer_table_structured=True``. Here is an example (Note: this example requires the ``pdf`` extra. This can be installed with ``pip install "unstructured[pdf]"``):
.. code:: python
@ -257,7 +255,7 @@ looks like the following:
from unstructured.partition.auto import partition
from unstructured.staging.base import elements_to_json
input_filename = "example-10k.html"
input_filename = "example-docs/example-10k.html"
output_filename = "outputs.json"
elements = partition(filename=input_filename)

View File

@ -1 +1 @@
__version__ = "0.10.5-dev3" # pragma: no cover
__version__ = "0.10.5-dev4" # pragma: no cover