doc: add pdf extra note (#1165)

2025-10-05 13:24:44 +00:00 · 2023-08-22 11:20:26 -07:00 · 2023-08-22 11:20:26 -07:00 · ab7fafcb41
commit ab7fafcb41
parent 4114022d9d
3 changed files with 6 additions and 7 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,4 +1,4 @@
-## 0.10.5-dev3
+## 0.10.5-dev4

 ### Enhancements
 * Create new CI Pipelines
@ -7,6 +7,7 @@
 * `partition` raises and error and tells the user to install the appropriate extra if a filetype
  is detected that is missing dependencies.
 * Add custom errors to ingest
+* Add notes on extra installs to docs


 ## 0.10.3
--- a/docs/source/introduction/getting_started.rst
+++ b/docs/source/introduction/getting_started.rst
@ -58,9 +58,7 @@ The example documents in this section come from the
 directory in the ``unstructured`` repo.

 Before running the code in this make sure you've installed the ``unstructured`` library
-and all dependencies using the instructions in the **Quick Start** section.
-
-
+and all dependencies using the instructions in the `Quick Start <https://unstructured-io.github.io/unstructured/installing.html#quick-start>`_ section.

 Partitioning a document
 ~~~~~~~~~~~~~~~~~~~~~~~
@ -164,7 +162,7 @@ of the table will be available in the element metadata under ``element.metadata.
 table extraction is available, the ``partition`` function will extract tables automatically if they are present.
 For PDFs and images, table extraction requires a relatively expensive call to a table recognition model, and so for those
 document types table extraction is an option you need to enable. If you would like to extract tables for PDFs or images,
-pass in ``infer_table_structured=True``. Here is an example:
+pass in ``infer_table_structured=True``. Here is an example (Note: this example requires the ``pdf`` extra. This can be installed with ``pip install "unstructured[pdf]"``):

 .. code:: python

@ -257,7 +255,7 @@ looks like the following:
    from unstructured.partition.auto import partition
    from unstructured.staging.base import elements_to_json

-    input_filename = "example-10k.html"
+    input_filename = "example-docs/example-10k.html"
    output_filename = "outputs.json"

    elements = partition(filename=input_filename)
--- a/unstructured/version.py
+++ b/unstructured/version.py
@ -1 +1 @@
-__version__ = "0.10.5-dev3"  # pragma: no cover
+__version__ = "0.10.5-dev4"  # pragma: no cover