update partition_pdf section; added partition_image

This commit is contained in:
Matt Robinson 2023-02-09 15:13:26 -05:00
parent 24c90a03dc
commit 7f9aefc549

View File

@ -128,9 +128,11 @@ Examples:
``partition_pdf``
---------------------
The ``partition_pdf`` function segments a PDF document by calling the document image analysis API.
The intent of the parameters ``url`` and ``token`` is to allow users to self host an inference API,
if desired.
The ``partition_pdf`` function segments a PDF document by using a document image analysis model.
If you set ``url=None``, the document image analysis model will execute locally. If you set
the URL, ``partition_pdf`` will make a call to a remote inference server.
``partition_pdf`` also includes a ``token`` function that allows you to pass in an authentication
token for a remote API call.
Examples:
@ -139,7 +141,25 @@ Examples:
from unstructured.partition.pdf import partition_pdf
# Returns a List[Element] present in the pages of the parsed pdf document
elements = partition_pdf("example-docs/layout-parser-paper-fast.pdf")
elements = partition_pdf("example-docs/layout-parser-paper-fast.pdf", url=None)
``partition_image``
---------------------
The ``partition_image`` function has the same API as ``partition_pdf``, which is document above.
The only difference is that ``partition_image`` does not need to convert a PDF to an image
prior to processing. The ``partition_image`` function supports ``.png`` and ``.jpg`` files.
Examples:
.. code:: python
from unstructured.partition.image import partition_image
# Returns a List[Element] present in the pages of the parsed image document
elements = partition_image("example-docs/layout-parser-paper-fast.jpg", url=None)
``partition_email``