unstructured/docs/source/apis/usage_methods.rst
Filip Knefel bdfd975115
chore: change table extraction defaults (#2588)
Change default values for table extraction - works in pair with
[this](https://github.com/Unstructured-IO/unstructured-api/pull/370)
`unstructured-api` PR

We want to move away from `pdf_infer_table_structure` parameter, in this
PR:
- We change how it's treated wrt `skip_infer_table_types` parameter.
Whether to extract tables from pdf now follows from the rule:
`pdf_infer_table_structure && "pdf" not in skip_infer_table_types`
- We set it to `pdf_infer_table_structure=True` and
`skip_infer_table_types=[]` by default
- We remove it from the examples in documentation
- We describe it as deprecated in favor of `skip_infer_table_types` in
documentation

More detailed description of how we want parameters to interact
- if `pdf_infer_table_structure` is False tables will never extracted
from pdf
- if `pdf_infer_table_structure` is True tables will be extracted from
pdf unless it's skipped via `skip_infer_table_types`
- on default `pdf_infer_table_structure=True` and
`skip_infer_table_types=[]`

---------

Co-authored-by: Filip Knefel <filip@unstructured.io>
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: ds-filipknefel <ds-filipknefel@users.noreply.github.com>
Co-authored-by: Ronny H <138828701+ron-unstructured@users.noreply.github.com>
2024-03-22 10:08:49 +00:00

82 lines
2.9 KiB
ReStructuredText

Accessing Unstructured API
==========================
Method 1: Partition via API (``partition_via_api``)
---------------------------------------------------
- **Functionality**: Automates the partitioning of documents using the hosted or locally hosted Unstructured API.
- **Key Features**:
- API Key Authentication.
- Automatic or explicit MIME type handling.
- **Usage Examples**:
- **Basic Use Case**::
from unstructured.partition.api import partition_via_api
filename = "example-docs/eml/fake-email.eml"
elements = partition_via_api(filename=filename, api_key="MY_API_KEY", content_type="message/rfc822")
- **Advanced Settings**::
from unstructured.partition.api import partition_via_api
filename = "example-docs/DA-1p.pdf"
elements = partition_via_api(
filename=filename, api_key="MY_API_KEY", strategy="auto"
)
- **Self-Hosting or Local API**::
from unstructured.partition.api import partition_via_api
filename = "example-docs/eml/fake-email.eml"
elements = partition_via_api(
filename=filename, api_url="http://localhost:5000/general/v0/general"
)
- **More Details**: For comprehensive information, visit the `Partition via API Documentation <https://unstructured-io.github.io/unstructured/core/partition.html#partition-via-api>`_.
Method 2: Local Deployment Using ``unstructured-api`` Library
-------------------------------------------------------------
- **Environment Setup**:
- Use ``pyenv`` and ``virtualenv`` for environment management.
- Install dependencies as per OS requirements.
- **Running the Application**:
- Run ``make install`` for dependencies installation.
- Start with ``make run-jupyter`` for Jupyter Notebook or ``make run-web-app`` for FastAPI Web App.
- **Using the API Locally**:
- Example API Call::
curl -X 'POST' \
'http://localhost:8000/general/v0/general' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'files=@sample-docs/family-day.eml' \
| jq -C . | less -R
- **Additional Features**:
- Parallel processing for PDFs with environment variables.
- Server load management with UNSTRUCTURED_MEMORY_FREE_MINIMUM_MB.
- **Using Docker Image**: Docker commands for pulling and running the container.
- **More Details**: Check out the `unstructured-api GitHub Repository <https://github.com/Unstructured-IO/unstructured-api>`_ for further information.
Method 3: Accessing via Swagger UI
----------------------------------
- **Procedure**:
1. Visit the Swagger UI Documentation: `Swagger UI <https://api.unstructured.io/general/docs#/default/pipeline_1_general_v0_general_post>`_.
2. Click "Try it out" for interactive testing.
3. Enter API key in "unstructured-api-key" field
4. Enter parameters in "Request body".
5. Click "execute" to send the request.
6. Download or view the JSON output.