mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-12 11:35:53 +00:00

This PR adds documentation of models supported by the `Unstructured` tool. The changes reflect the tool's capabilities, usage examples, and the process for integrating custom models. Sections: - Detailed the basic usage of the `Unstructured` partition with the model name. - Provided a list of available models in the `Unstructured` partition. - Added instructions on using non-default models via three distinct methods. - Explained leveraging models from the LayoutParser's model zoo with `UnstructuredDetectronModel`. - Guided users in integrating their custom object detection models using the `UnstructuredObjectDetectionModel` class. Tested the docs build with: > cd docs > pip install -r requirements.txt > make html
92 lines
4.2 KiB
ReStructuredText
92 lines
4.2 KiB
ReStructuredText
.. role:: raw-html(raw)
|
||
:format: html
|
||
|
||
Models
|
||
======
|
||
|
||
Depending on your need, ``Unstructured`` provides OCR-based and Transformer-based models to detect elements in the documents. The models are useful to detect the complex layout in the documents and predict the element types.
|
||
|
||
**Basic usage:**
|
||
|
||
.. code:: python
|
||
|
||
elements = partition(filename=filename, strategy='hi_res', model_name='chipper')
|
||
|
||
Notes:
|
||
|
||
* To use a the detection model, set: ``strategy='hi_res'``.
|
||
* When ``model_name`` is not defined, the inferences will fall back to the default model.
|
||
|
||
:raw-html:`<br />`
|
||
**List of Available Models in the Partitions:**
|
||
|
||
* ``detectron2_onnx`` is a Computer Vision model by Facebook AI that provides object detection and segmentation algorithms with ONNX Runtime. It is the fastest model with the ``hi_res`` strategy.
|
||
* ``yolox`` is a single-stage real-time object detector that modifies YOLOv3 with a DarkNet53 backbone.
|
||
* ``yolox_quantized``: runs faster than YoloX and its speed is closer to Detectron2.
|
||
* ``chipper`` (beta version): the Chipper model is Unstructured’s in-house image-to-text model based on transformer-based Visual Document Understanding (VDU) models.
|
||
|
||
|
||
Using a Non-Default Model
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
``Unstructured`` will download the model specified in ``UNSTRUCTURED_HI_RES_MODEL_NAME`` environment variable. If not defined, it will download the default model.
|
||
|
||
There are three ways you can use the non-default model as follows:
|
||
|
||
1. Store the model name in the environment variable
|
||
|
||
.. code:: python
|
||
|
||
import os
|
||
from unstructured.partition.pdf import partition_pdf
|
||
|
||
os.environ["UNSTRUCTURED_HI_RES_MODEL_NAME"] = "yolox"
|
||
out_yolox = partition_pdf("example-docs/layout-parser-paper-fast.pdf", strategy="hi_res")
|
||
|
||
|
||
2. Pass the model name in the ``partition`` function.
|
||
|
||
.. code:: python
|
||
|
||
filename = "example-docs/layout-parser-paper-fast.pdf"
|
||
elements = partition(filename=filename, strategy='hi_res', model_name='yolox')
|
||
|
||
3. Use `unstructured-inference <url_>`_ library.
|
||
|
||
.. _url: https://github.com/Unstructured-IO/unstructured-inference
|
||
|
||
.. code:: python
|
||
|
||
from unstructured_inference.models.base import get_model
|
||
from unstructured_inference.inference.layout import DocumentLayout
|
||
|
||
model = get_model("yolox")
|
||
layout = DocumentLayout.from_file("sample-docs/layout-parser-paper.pdf", detection_model=model)
|
||
|
||
|
||
|
||
Bring Your Own Models
|
||
^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
**Utilizing Layout Detection Model Zoo**
|
||
|
||
In the `LayoutParser <layout_>`_ library, you can use various pre-trained models available in the `model zoo <modelzoo_>`_ for document layout analysis. Here's a guide on leveraging this feature using the ``UnstructuredDetectronModel`` class in ``unstructured-inference`` library.
|
||
|
||
The ``UnstructuredDetectronModel`` class in ``unstructured_inference.models.detectron2`` uses the ``faster_rcnn_R_50_FPN_3x`` model pretrained on ``DocLayNet``. But any model in the model zoo can be used by using different construction parameters. ``UnstructuredDetectronModel`` is a light wrapper around the LayoutParser's ``Detectron2LayoutModel`` object, and accepts the same arguments.
|
||
|
||
.. _modelzoo: https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html
|
||
|
||
.. _layout: https://layout-parser.readthedocs.io/en/latest/api_doc/models.html#layoutparser.models.Detectron2LayoutModel
|
||
|
||
**Using Your Own Object Detection Model**
|
||
|
||
To seamlessly integrate your custom detection and extraction models into ``unstructured_inference`` pipeline, start by wrapping your model within the ``UnstructuredObjectDetectionModel`` class. This class acts as an intermediary between your detection model and Unstructured workflow.
|
||
|
||
Ensure your ``UnstructuredObjectDetectionModel`` subclass incorporates two vital methods:
|
||
|
||
1. The ``predict`` method, which should be designed to accept a ``PIL.Image.Image`` type and return a list of ``LayoutElements``, facilitating the communication of your model's results.
|
||
2. The ``initialize`` method is essential for loading and prepping your model for inference, guaranteeing its readiness for any incoming tasks.
|
||
|
||
It's important that your model's outputs, specifically from the predict method, integrate smoothly with the DocumentLayout class for optimal performance.
|
||
|