mirror of
https://github.com/PaddlePaddle/PaddleOCR.git
synced 2025-12-28 23:48:43 +00:00
Revise the English document (#15218)
This commit is contained in:
parent
48fb815890
commit
105ae361ea
@ -545,6 +545,24 @@ paddleocr ocr -i ./general_ocr_002.png --ocr_version PP-OCRv4
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><code>input</code></td>
|
||||
<td>Input data (required). Supports:
|
||||
<ul>
|
||||
<li><b>Python Var</b>: e.g., <code>numpy.ndarray</code> image data;</li>
|
||||
<li><b>str</b>: Local file path (e.g., <code>/root/data/img.jpg</code>), URL (e.g., <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_doc_preprocessor_002.png">example</a>), or directory (e.g., <code>/root/data/</code>);</li>
|
||||
<li><b>List</b>: List of inputs, e.g., <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>save_path</code></td>
|
||||
<td>Path to save inference results. If <code>None</code>, results are not saved locally.</td>
|
||||
<td><code>str</code></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>doc_orientation_classify_model_name</code></td>
|
||||
<td>Name of the document orientation classification model. If <code>None</code>, the default pipeline model is used.</td>
|
||||
<td><code>str</code></td>
|
||||
@ -744,24 +762,6 @@ paddleocr ocr -i ./general_ocr_002.png --ocr_version PP-OCRv4
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>input</code></td>
|
||||
<td>Input data (required). Supports:
|
||||
<ul>
|
||||
<li><b>Python Var</b>: e.g., <code>numpy.ndarray</code> image data;</li>
|
||||
<li><b>str</b>: Local file path (e.g., <code>/root/data/img.jpg</code>), URL (e.g., <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_doc_preprocessor_002.png">example</a>), or directory (e.g., <code>/root/data/</code>);</li>
|
||||
<li><b>List</b>: List of inputs, e.g., <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>save_path</code></td>
|
||||
<td>Path to save inference results. If <code>None</code>, results are not saved locally.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
<td>Device for inference. Supports:
|
||||
<ul>
|
||||
@ -813,6 +813,12 @@ paddleocr ocr -i ./general_ocr_002.png --ocr_version PP-OCRv4
|
||||
<td><code>int</code></td>
|
||||
<td><code>8</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>paddlex_config</code></td>
|
||||
<td>Path to PaddleX pipeline configuration file.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</details>
|
||||
@ -1133,6 +1139,12 @@ The Python script above performs the following steps:
|
||||
<td><code>int</code></td>
|
||||
<td><code>8</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>paddlex_config</code></td>
|
||||
<td>Path to PaddleX pipeline configuration file.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</details>
|
||||
@ -1158,7 +1170,7 @@ The Python script above performs the following steps:
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td><code>None</code></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
|
||||
@ -149,6 +149,24 @@ paddleocr doc_preprocessor -i ./doc_test_rotated.jpg --device gpu
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><code>input</code></td>
|
||||
<td>The data to be predicted, supporting multiple input types. This parameter is required.
|
||||
<ul>
|
||||
<li><b>Python Var</b>: For example, image data represented as <code>numpy.ndarray</code>.</li>
|
||||
<li><b>str</b>: For example, the local path of an image file or PDF file: <code>/root/data/img.jpg</code>; <b>or a URL link</b>, such as the network URL of an image file or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_doc_preprocessor_002.png">example</a>; <b>or a local directory</b>, which should contain the images to be predicted, such as the local path: <code>/root/data/</code> (currently does not support prediction of PDF files in directories; PDF files need to be specified to a specific file path).</li>
|
||||
<li><b>List</b>: The list elements should be of the above types, such as <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"]</code>.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>save_path</code></td>
|
||||
<td>Specify the path to save the inference result file. If set to <code>None</code>, the inference result will not be saved locally.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>doc_orientation_classify_model_name</code></td>
|
||||
<td>The name of the document orientation classification model. If set to <code>None</code>, the pipeline's default model will be used.</td>
|
||||
<td><code>str</code></td>
|
||||
@ -185,24 +203,6 @@ paddleocr doc_preprocessor -i ./doc_test_rotated.jpg --device gpu
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>input</code></td>
|
||||
<td>The data to be predicted, supporting multiple input types. This parameter is required.
|
||||
<ul>
|
||||
<li><b>Python Var</b>: For example, image data represented as <code>numpy.ndarray</code>.</li>
|
||||
<li><b>str</b>: For example, the local path of an image file or PDF file: <code>/root/data/img.jpg</code>; <b>or a URL link</b>, such as the network URL of an image file or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_doc_preprocessor_002.png">example</a>; <b>or a local directory</b>, which should contain the images to be predicted, such as the local path: <code>/root/data/</code> (currently does not support prediction of PDF files in directories; PDF files need to be specified to a specific file path).</li>
|
||||
<li><b>List</b>: The list elements should be of the above types, such as <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"]</code>.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>save_path</code></td>
|
||||
<td>Specify the path to save the inference result file. If set to <code>None</code>, the inference result will not be saved locally.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
<td>The device used for inference. Support for specifying specific card numbers.
|
||||
<ul>
|
||||
@ -254,6 +254,12 @@ paddleocr doc_preprocessor -i ./doc_test_rotated.jpg --device gpu
|
||||
<td><code>int</code></td>
|
||||
<td><code>8</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>paddlex_config</code></td>
|
||||
<td>Path to PaddleX pipeline configuration file.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</details>
|
||||
@ -389,6 +395,12 @@ In the above Python script, the following steps are executed:
|
||||
<td><code>int</code></td>
|
||||
<td><code>8</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>paddlex_config</code></td>
|
||||
<td>Path to PaddleX pipeline configuration file.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
@ -417,7 +429,7 @@ The following are the parameters and their descriptions of the `predict()` metho
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td><code>None</code></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
|
||||
@ -12,7 +12,7 @@ The Document Understanding Pipeline is an advanced document processing technolog
|
||||
|
||||
<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/pipelines/doc_understanding/doc_understanding.png">
|
||||
|
||||
<b>The general document image preprocessing pipeline includes the following module. Each module can be trained and inferred independently and contains multiple models. For more details, click the corresponding module to view the documentation.</b>
|
||||
<b>The document understanding pipeline includes the following module. Each module can be trained and inferred independently and contains multiple models. For more details, click the corresponding module to view the documentation.</b>
|
||||
|
||||
- [Document-like Vision Language Model Module](../module_usage/doc_vlm.md)
|
||||
|
||||
@ -77,6 +77,22 @@ paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebo
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><code>input</code></td>
|
||||
<td>Data to be predicted, supports dictionary type input, required.
|
||||
<ul>
|
||||
<li><b>Python Dict</b>: The input format for PP-DocBee is: <code>{"image":/path/to/image, "query": user question}</code>, representing the input image and corresponding user question.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>save_path</code></td>
|
||||
<td>Specify the path for saving the inference result file. If set to <code>None</code>, the inference result will not be saved locally.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>doc_understanding_model_name</code></td>
|
||||
<td>The name of the document understanding model. If set to <code>None</code>, the default model of the pipeline will be used.</td>
|
||||
<td><code>str</code></td>
|
||||
@ -95,22 +111,6 @@ paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebo
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>input</code></td>
|
||||
<td>Data to be predicted, supports dictionary type input, required.
|
||||
<ul>
|
||||
<li><b>Python Dict</b>: The input format for PP-DocBee is: <code>{"image":/path/to/image, "query": user question}</code>, representing the input image and corresponding user question.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>save_path</code></td>
|
||||
<td>Specify the path for saving the inference result file. If set to <code>None</code>, the inference result will not be saved locally.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
<td>The device used for inference. Supports specifying a specific card number.
|
||||
<ul>
|
||||
@ -162,6 +162,12 @@ paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebo
|
||||
<td><code>int</code></td>
|
||||
<td><code>8</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>paddlex_config</code></td>
|
||||
<td>Path to PaddleX pipeline configuration file.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</details>
|
||||
@ -276,6 +282,12 @@ In the above Python script, the following steps are performed:
|
||||
<td><code>int</code></td>
|
||||
<td><code>8</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>paddlex_config</code></td>
|
||||
<td>Path to PaddleX pipeline configuration file.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
@ -302,7 +314,7 @@ Below are the parameters and their descriptions for the `predict()` method:
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Dict</code></td>
|
||||
<td><code>None</code></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
|
||||
@ -10,7 +10,7 @@ comments: true
|
||||
|
||||
<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/pipelines/doc_understanding/doc_understanding.png">
|
||||
|
||||
<b>通用文档理解产线中包含以下1个模块。每个模块均可独立进行训练和推理,并包含多个模型。有关详细信息,请点击相应模块以查看文档。</b>
|
||||
<b>文档理解产线中包含以下1个模块。每个模块均可独立进行训练和推理,并包含多个模型。有关详细信息,请点击相应模块以查看文档。</b>
|
||||
|
||||
- [文档类视觉语言模型模块](../module_usage/doc_vlm.md)
|
||||
|
||||
|
||||
@ -415,6 +415,25 @@ paddleocr formula_recognition_pipeline -i ./general_formula_recognition_001.png
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><code>input</code></td>
|
||||
<td>Data to be predicted, supporting multiple input types, required.
|
||||
<ul>
|
||||
<li><b>Python Var</b>: Image data represented by <code>numpy.ndarray</code></li>
|
||||
<li><b>str</b>: Local path of image or PDF file, e.g., <code>/root/data/img.jpg</code>; <b>URL link</b>, e.g., network URL of image or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/demo_image/pipelines/general_formula_recognition_001.png">Example</a>; <b>Local directory</b>, the directory should contain images to be predicted, e.g., local path: <code>/root/data/</code> (currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path)</li>
|
||||
<li><b>List</b>: Elements of the list must be of the above types, e.g., <code>[numpy.ndarray, numpy.ndarray]</code>, <code>[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]</code>, <code>[\"/root/data1\", \"/root/data2\"]</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>save_path</code></td>
|
||||
<td>
|
||||
Specify the path to save the inference results file. If set to <code>None</code>, the inference results will not be saved locally.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>doc_orientation_classify_model_name</code></td>
|
||||
<td>
|
||||
The name of the document orientation classification model. If set to <code>None</code>, the default model in pipeline will be used.</td>
|
||||
@ -454,6 +473,19 @@ The name of the document orientation classification model. If set to <code>None<
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_orientation_classify</code></td>
|
||||
<td>Whether to load the document orientation classification module. If set to <code>None</code>, the parameter will default to the value initialized in the pipeline, which is <code>True</code>.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_unwarping</code></td>
|
||||
<td>
|
||||
Whether to load the text image unwarping module. If set to <code>None</code>, the parameter will default to the value initialized in the pipeline, which is <code>True</code>.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_detection_model_name</code></td>
|
||||
<td>
|
||||
The name of the layout detection model. If set to <code>None</code>, the default model in pipeline will be used. </td>
|
||||
@ -520,19 +552,6 @@ The scaling factor for the side length of the detection boxes in layout region d
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_orientation_classify</code></td>
|
||||
<td>Whether to load the document orientation classification module. If set to <code>None</code>, the parameter will default to the value initialized in the pipeline, which is <code>True</code>.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_unwarping</code></td>
|
||||
<td>
|
||||
Whether to load the text image unwarping module. If set to <code>None</code>, the parameter will default to the value initialized in the pipeline, which is <code>True</code>.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_layout_detection</code></td>
|
||||
<td>
|
||||
Whether to load the layout detection module. If set to <code>None</code>, the parameter will default to the value initialized in the pipeline, which is <code>True</code>.</td>
|
||||
@ -561,25 +580,6 @@ The name of the formula recognition model. If set to <code>None</code>, the defa
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>input</code></td>
|
||||
<td>Data to be predicted, supporting multiple input types, required.
|
||||
<ul>
|
||||
<li><b>Python Var</b>: Image data represented by <code>numpy.ndarray</code></li>
|
||||
<li><b>str</b>: Local path of image or PDF file, e.g., <code>/root/data/img.jpg</code>; <b>URL link</b>, e.g., network URL of image or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/demo_image/pipelines/general_formula_recognition_001.png">Example</a>; <b>Local directory</b>, the directory should contain images to be predicted, e.g., local path: <code>/root/data/</code> (currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path)</li>
|
||||
<li><b>List</b>: Elements of the list must be of the above types, e.g., <code>[numpy.ndarray, numpy.ndarray]</code>, <code>[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]</code>, <code>[\"/root/data1\", \"/root/data2\"]</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>save_path</code></td>
|
||||
<td>
|
||||
Specify the path to save the inference results file. If set to <code>None</code>, the inference results will not be saved locally.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
<td>The device used for inference. You can specify a particular card number.
|
||||
<ul>
|
||||
@ -633,6 +633,12 @@ The number of threads to use when performing inference on the CPU.</td>
|
||||
<td><code>int</code></td>
|
||||
<td><code>8</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>paddlex_config</code></td>
|
||||
<td>Path to PaddleX pipeline configuration file.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</details>
|
||||
@ -729,6 +735,18 @@ In the above Python script, the following steps are executed:
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_orientation_classify</code></td>
|
||||
<td>Whether to load the document orientation classification module. If set to <code>None</code>, the parameter will default to the value initialized in the pipeline, which is <code>True</code>.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_unwarping</code></td>
|
||||
<td>Whether to load the text image unwarping module. If set to <code>None</code>, the parameter will default to the value initialized in the pipeline, which is <code>True</code>.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_detection_model_name</code></td>
|
||||
<td>The name of the layout detection model. If set to <code>None</code>, the default model in pipeline will be used. </td>
|
||||
<td><code>str</code></td>
|
||||
@ -790,18 +808,6 @@ In the above Python script, the following steps are executed:
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_orientation_classify</code></td>
|
||||
<td>Whether to load the document orientation classification module. If set to <code>None</code>, the parameter will default to the value initialized in the pipeline, which is <code>True</code>.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_unwarping</code></td>
|
||||
<td>Whether to load the text image unwarping module. If set to <code>None</code>, the parameter will default to the value initialized in the pipeline, which is <code>True</code>.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_layout_detection</code></td>
|
||||
<td>Whether to load the layout detection module. If set to <code>None</code>, the parameter will default to the value initialized in the pipeline, which is <code>True</code>.</td>
|
||||
<td><code>bool</code></td>
|
||||
@ -878,6 +884,12 @@ In the above Python script, the following steps are executed:
|
||||
<td><code>int</code></td>
|
||||
<td><code>8</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>paddlex_config</code></td>
|
||||
<td>Path to PaddleX pipeline configuration file.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
@ -907,7 +919,7 @@ Here are the parameters of the `predict()` method and their descriptions:
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td><code>None</code></td>
|
||||
<td></td>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
<td>The parameters are the same as those used during instantiation.</td>
|
||||
|
||||
@ -569,7 +569,7 @@ The ultra-lightweight cyrillic alphabet recognition model trained based on the P
|
||||
|
||||
Before using the seal text recognition production line locally, please ensure that you have completed the installation of the wheel package according to the [installation tutorial](../installation.md). Once the installation is complete, you can experience it locally via the command line or integrate it with Python.
|
||||
|
||||
### 2.1 命令行方式体验
|
||||
### 2.1 Command Line Experience
|
||||
|
||||
You can quickly experience the seal_recognition production line effect with a single command:
|
||||
|
||||
@ -586,6 +586,305 @@ paddleocr seal_recognition -i ./seal_text_det.png --use_doc_unwarping True
|
||||
paddleocr seal_recognition -i ./seal_text_det.png --device gpu
|
||||
```
|
||||
|
||||
<details><summary><b>The command line supports more parameter settings. Click to expand for detailed explanations of command line parameters.</b></summary>
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Parameter</th>
|
||||
<th>Description</th>
|
||||
<th>Parameter Type</th>
|
||||
<th>Default Value</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><code>input</code></td>
|
||||
<td>Data to be predicted, supporting multiple input types, required.
|
||||
<ul>
|
||||
<li><b>Python Var</b>: Image data represented by <code>numpy.ndarray</code></li>
|
||||
<li><b>str</b>: Local path of image or PDF file, e.g., <code>/root/data/img.jpg</code>; <b>URL link</b>, e.g., network URL of image or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png">Example</a>; <b>Local directory</b>, the directory should contain images to be predicted, e.g., local path: <code>/root/data/</code> (currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path)</li>
|
||||
<li><b>List</b>: Elements of the list must be of the above types, e.g., <code>[numpy.ndarray, numpy.ndarray]</code>, <code>[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]</code>, <code>[\"/root/data1\", \"/root/data2\"]</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>save_path</code></td>
|
||||
<td>
|
||||
Specify the path to save the inference results file. If set to <code>None</code>, the inference results will not be saved locally.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>doc_orientation_classify_model_name</code></td>
|
||||
<td>
|
||||
The name of the document orientation classification model. If set to <code>None</code>, the default model in pipeline will be used.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>doc_orientation_classify_model_dir</code></td>
|
||||
<td>The directory path of the document orientation classification model. If set to <code>None</code>, the official model will be downloaded.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>doc_unwarping_model_name</code></td>
|
||||
<td> The name of the text image unwarping model. If set to <code>None</code>, the default model in pipeline will be used.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>doc_unwarping_model_dir</code></td>
|
||||
<td> The directory path of the text image unwarping model. If set to <code>None</code>, the official model will be downloaded.
|
||||
</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_detection_model_name</code></td>
|
||||
<td>
|
||||
The name of the layout detection model. If set to <code>None</code>, the default model in pipeline will be used. </td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_detection_model_dir</code></td>
|
||||
<td> The directory path of the layout detection model. If set to <code>None</code>, the official model will be downloaded.
|
||||
</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_text_detection_model_name</code></td>
|
||||
<td>The name of the seal text detection model. If set to <code>None</code>, the production line's default model will be used.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_text_detection_model_dir</code></td>
|
||||
<td>The directory path of the seal text detection model. If set to <code>None</code>, the official model will be downloaded.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>text_recognition_model_name</code></td>
|
||||
<td>Name of the text recognition model. If <code>None</code>, the default pipeline model is used.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>text_recognition_model_dir</code></td>
|
||||
<td>Directory path of the text recognition model. If <code>None</code>, the official model is downloaded.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>text_recognition_batch_size</code></td>
|
||||
<td>Batch size for the text recognition model. If <code>None</code>, defaults to <code>1</code>.</td>
|
||||
<td><code>int</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_orientation_classify</code></td>
|
||||
<td>Whether to enable document orientation classification. If <code>None</code>, defaults to pipeline initialization value (<code>True</code>).</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_unwarping</code></td>
|
||||
<td>Whether to enable text image correction. If <code>None</code>, defaults to pipeline initialization value (<code>True</code>).</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_layout_detection</code></td>
|
||||
<td>
|
||||
Whether to load the layout detection module. If set to <code>None</code>, the parameter will default to the value initialized in the pipeline, which is <code>True</code>.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_threshold</code></td>
|
||||
<td>Threshold for layout detection, used to filter out predictions with low confidence.
|
||||
<ul>
|
||||
<li><b>float</b>, such as 0.2, indicates filtering out all bounding boxes with a confidence score less than 0.2.</li>
|
||||
<li><b>Dictionary</b>, with <b>int</b> keys representing <code>cls_id</code> and <b>float</b> values as thresholds. For example, <code>{0: 0.45, 2: 0.48, 7: 0.4}</code> indicates applying a threshold of 0.45 for class ID 0, 0.48 for class ID 2, and 0.4 for class ID 7</li>
|
||||
<li><b>None</b>, If not specified, the default PaddleX official model configuration will be used</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>float|dict</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_nms</code></td>
|
||||
<td>Whether to use NMS (Non-Maximum Suppression) post-processing for layout region detection to filter out overlapping boxes. If set to <code>None</code>, the default configuration of the official model will be used.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_unclip_ratio</code></td>
|
||||
<td>The scaling factor for the side length of the detection boxes in layout region detection.
|
||||
<ul>
|
||||
<li><b>float</b>: A positive float number, e.g., 1.1, indicating that the center of the bounding box remains unchanged while the width and height are both scaled up by a factor of 1.1</li>
|
||||
<li><b>List</b>: e.g., [1.2, 1.5], indicating that the center of the bounding box remains unchanged while the width is scaled up by a factor of 1.2 and the height by a factor of 1.5</li>
|
||||
<li><b>None</b>: If not specified, the default PaddleX official model configuration will be used</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>float|list</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_merge_bboxes_mode</code></td>
|
||||
<td>The merging mode for the detection boxes output by the model in layout region detection.
|
||||
<ul>
|
||||
<li><b>large</b>: When set to "large", only the largest outer bounding box will be retained for overlapping bounding boxes, and the inner overlapping boxes will be removed.</li>
|
||||
<li><b>small</b>: When set to "small", only the smallest inner bounding boxes will be retained for overlapping bounding boxes, and the outer overlapping boxes will be removed.</li>
|
||||
<li><b>union</b>: No filtering of bounding boxes will be performed, and both inner and outer boxes will be retained.</li>
|
||||
<li><b>None</b>: If not specified, the default PaddleX official model configuration will be used</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_det_limit_side_len</code></td>
|
||||
<td>The side length limit for seal detection images.</td>
|
||||
<td><code>int|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>int</b>: Any integer greater than <code>0</code>;</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>960</code>;</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_det_limit_type</code></td>
|
||||
<td>The type of side length limit for seal detection images.</td>
|
||||
<td><code>str|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>str</b>: Supports <code>min</code> and <code>max</code>, where <code>min</code> ensures that the shortest side of the image is not less than <code>det_limit_side_len</code>, and <code>max</code> ensures that the longest side of the image is not greater than <code>limit_side_len</code>.</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>max</code>;</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_det_thresh</code></td>
|
||||
<td>The pixel threshold for detection. In the output probability map, pixel points with scores greater than this threshold will be considered as seal pixels.</td>
|
||||
<td><code>float|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>0.3</code>.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_det_box_thresh</code></td>
|
||||
<td>The bounding box threshold for detection. When the average score of all pixel points within the detection result bounding box is greater than this threshold, the result will be considered as a seal region.</td>
|
||||
<td><code>float|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>0.6</code>.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_det_unclip_ratio</code></td>
|
||||
<td>The expansion coefficient for seal detection. This method is used to expand the seal region, and the larger the value, the larger the expansion area.</td>
|
||||
<td><code>float|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>2.0</code>.</li>
|
||||
</ul>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_rec_score_thresh</code></td>
|
||||
<td>The seal recognition threshold. Text results with scores greater than this threshold will be retained.</td>
|
||||
<td><code>float|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>float</b>: Any floating-point number greater than <code>0</code>.</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>0.0</code>. I.e., no threshold is set.</li>
|
||||
</ul>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
<td>The device used for inference. Support for specifying specific card numbers.
|
||||
<ul>
|
||||
<li><b>CPU</b>: For example, <code>cpu</code> indicates using the CPU for inference.</li>
|
||||
<li><b>GPU</b>: For example, <code>gpu:0</code> indicates using the first GPU for inference.</li>
|
||||
<li><b>NPU</b>: For example, <code>npu:0</code> indicates using the first NPU for inference.</li>
|
||||
<li><b>XPU</b>: For example, <code>xpu:0</code> indicates using the first XPU for inference.</li>
|
||||
<li><b>MLU</b>: For example, <code>mlu:0</code> indicates using the first MLU for inference.</li>
|
||||
<li><b>DCU</b>: For example, <code>dcu:0</code> indicates using the first DCU for inference.</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the parameter value initialized by the pipeline will be used by default. During initialization, the local GPU 0 device will be prioritized; if not available, the CPU device will be used.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>enable_hpi</code></td>
|
||||
<td>Whether to enable high-performance inference.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>False</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_tensorrt</code></td>
|
||||
<td>Whether to use TensorRT for inference acceleration.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>False</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>min_subgraph_size</code></td>
|
||||
<td>The minimum subgraph size, used to optimize the computation of model subgraphs.</td>
|
||||
<td><code>int</code></td>
|
||||
<td><code>3</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>precision</code></td>
|
||||
<td>The computational precision, such as fp32, fp16.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>fp32</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>enable_mkldnn</code></td>
|
||||
<td>Whether to enable the MKL-DNN acceleration library. If set to <code>None</code>, it will be enabled by default.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>cpu_threads</code></td>
|
||||
<td>The number of threads used for inference on the CPU.</td>
|
||||
<td><code>int</code></td>
|
||||
<td><code>8</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>paddlex_config</code></td>
|
||||
<td>Path to PaddleX pipeline configuration file.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</details>
|
||||
<br />
|
||||
|
||||
|
||||
|
||||
After running, the results will be printed to the terminal, as follows:
|
||||
@ -712,7 +1011,7 @@ In the above Python script, the following steps were executed:
|
||||
<li><b>List</b>: Elements of the list must be of the above types, e.g., <code>[numpy.ndarray, numpy.ndarray]</code>, <code>[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]</code>, <code>[\"/root/data1\", \"/root/data2\"]</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
@ -989,594 +1288,6 @@ In the above Python script, the following steps were executed:
|
||||
- The prediction results obtained through the `json` attribute are of dict type, with content consistent with what is saved by calling the `save_to_json()` method.
|
||||
- The prediction results returned by the `img` attribute are of dict type. The keys are `layout_det_res`, `seal_res_region1`, and `preprocessed_img`, corresponding to three `Image.Image` objects: one for visualizing layout detection, one for visualizing seal text recognition results, and one for visualizing image preprocessing. If the image preprocessing sub-module is not used, `preprocessed_img` will not be included in the dictionary. If the layout region detection module is not used, `layout_det_res` will not be included.
|
||||
|
||||
Additionally, you can obtain the configuration file for the seal text recognition pipeline and load the configuration file for prediction. You can execute the following command to save the results in `my_path`:
|
||||
|
||||
```
|
||||
paddlex --get_pipeline_config seal_recognition --save_path ./my_path
|
||||
```
|
||||
|
||||
If you have obtained the configuration file, you can customize the settings for the seal text recognition pipeline by simply modifying the `pipeline` parameter value in the `create_pipeline` method to the path of the pipeline configuration file. The example is as follows:
|
||||
|
||||
```python
|
||||
from paddlex import create_pipeline
|
||||
pipeline = create_pipeline(pipeline="./my_path/seal_recognition.yaml")
|
||||
output = pipeline.predict("seal_text_det.png")
|
||||
for res in output:
|
||||
res.print() ## 打印预测的结构化输出
|
||||
res.save_to_img("./output/") ## 保存可视化结果
|
||||
res.save_to_json("./output/") ## 保存预测结果的json文件
|
||||
```
|
||||
|
||||
(1) Instantiate the seal text recognition production object through `SealRecognition()`. The specific parameter descriptions are as follows:
|
||||
|
||||
Here's the translation of the table into English:
|
||||
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Parameter</th>
|
||||
<th>Description</th>
|
||||
<th>Type</th>
|
||||
<th>Default Value</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><code>doc_orientation_classify_model_name</code></td>
|
||||
<td>Name of the document orientation classification model. If set to <code>None</code>, the default model will be used.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>doc_orientation_classify_model_dir</code></td>
|
||||
<td>Directory path of the document orientation classification model. If set to <code>None</code>, the official model will be downloaded.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>doc_unwarping_model_name</code></td>
|
||||
<td>Name of the text image correction model. If set to <code>None</code>, the default model will be used.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>doc_unwarping_model_dir</code></td>
|
||||
<td>Directory path of the text image correction model. If set to <code>None</code>, the official model will be downloaded.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_detection_model_name</code></td>
|
||||
<td>Name of the layout detection model. If set to <code>None</code>, the default model will be used.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_detection_model_dir</code></td>
|
||||
<td>Directory path of the layout detection model. If set to <code>None</code>, the official model will be downloaded.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_text_detection_model_name</code></td>
|
||||
<td>Name of the seal text detection model. If set to <code>None</code>, the default model will be used.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_text_detection_model_dir</code></td>
|
||||
<td>Directory path of the seal text detection model. If set to <code>None</code>, the official model will be downloaded.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>text_recognition_model_name</code></td>
|
||||
<td>Name of the text recognition model. If set to <code>None</code>, the default model will be used.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>text_recognition_model_dir</code></td>
|
||||
<td>Directory path of the text recognition model. If set to <code>None</code>, the official model will be downloaded.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>text_recognition_batch_size</code></td>
|
||||
<td>Batch size for the text recognition model. If set to <code>None</code>, the default batch size is <code>1</code>.</td>
|
||||
<td><code>int</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_orientation_classify</code></td>
|
||||
<td>Whether to load the document orientation classification module. If set to <code>None</code>, the default value initialized by the production line will be used, initialized to <code>True</code>.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_unwarping</code></td>
|
||||
<td>Whether to load the text image correction module. If set to <code>None</code>, the default value initialized by the production line will be used, initialized to <code>True</code>.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_layout_detection</code></td>
|
||||
<td>Whether to load the layout detection module. If set to <code>None</code>, the default value initialized by the production line will be used, initialized to <code>True</code>.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_threshold</code></td>
|
||||
<td>Layout detection confidence threshold; only scores greater than this threshold will be output.
|
||||
<ul>
|
||||
<li><b>float</b>: Any floating point number greater than <code>0</code></li>
|
||||
<li><b>dict</b>: Keys are int category IDs, values are any floating point number greater than <code>0</code></li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value initialized by the production line, <code>0.5</code>, will be used</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>float|dict</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_nms</code></td>
|
||||
<td>Whether to use post-processing NMS in layout detection. If set to <code>None</code>, the default value initialized by the production line, initialized to <code>True</code>, will be used.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_unclip_ratio</code></td>
|
||||
<td>Scale factor for the sides of the detection box.
|
||||
<ul>
|
||||
<li><b>float</b>: A floating point number greater than 0, e.g., 1.1, indicates that the width and height of the detection box output by the model will be expanded by 1.1 times, keeping the center unchanged</li>
|
||||
<li><b>list</b>: e.g., [1.2, 1.5], indicates that the width will be expanded by 1.2 times and the height by 1.5 times, keeping the center unchanged</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value initialized by the production line, initialized to 1.0, will be used</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>float|list</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_merge_bboxes_mode</code></td>
|
||||
<td>Mode for merging detection boxes output by the model in layout detection.
|
||||
<ul>
|
||||
<li><b>large</b>: When set to large, only the largest external box will be retained for overlapping detection boxes, and overlapping internal boxes will be deleted</li>
|
||||
<li><b>small</b>: When set to small, only the small internal box will be retained, and overlapping external boxes will be deleted</li>
|
||||
<li><b>union</b>: No filtering will be performed, and both internal and external boxes will be retained</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value initialized by the production line, initialized to <code>large</code>, will be used</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_det_limit_side_len</code></td>
|
||||
<td>Image side length limit for seal text detection.
|
||||
<ul>
|
||||
<li><b>int</b>: Any integer greater than <code>0</code></li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value initialized by the production line, initialized to <code>736</code>, will be used</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>int</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_det_limit_type</code></td>
|
||||
<td>Image side length limit type for seal text detection.
|
||||
<ul>
|
||||
<li><b>str</b>: Supports <code>min</code> and <code>max</code>; <code>min</code> ensures that the shortest side of the image is not less than <code>det_limit_side_len</code>, while <code>max</code> ensures that the longest side of the image is not greater than <code>limit_side_len</code></li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value initialized by the production line, initialized to <code>min</code>, will be used</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_det_thresh</code></td>
|
||||
<td>Detection pixel threshold; only pixels with scores greater than this threshold in the output probability map will be considered as text pixels.
|
||||
<ul>
|
||||
<li><b>float</b>: Any floating point number greater than <code>0</code></li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value initialized by the production line, <code>0.2</code>, will be used</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>float</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_det_box_thresh</code></td>
|
||||
<td>Detection box threshold; when the average score of all pixels within a detection result box is greater than this threshold, the result is considered a text area.
|
||||
<ul>
|
||||
<li><b>float</b>: Any floating point number greater than <code>0</code></li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value initialized by the production line, <code>0.6</code>, will be used</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>float</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_det_unclip_ratio</code></td>
|
||||
<td>Seal text detection expansion coefficient; the larger the value, the greater the expansion area.
|
||||
<ul>
|
||||
<li><b>float</b>: Any floating point number greater than <code>0</code></li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value initialized by the production line, <code>0.5</code>, will be used</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>float</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_rec_score_thresh</code></td>
|
||||
<td>Text recognition threshold; text results with scores greater than this threshold will be retained.
|
||||
<ul>
|
||||
<li><b>float</b>: Any floating point number greater than <code>0</code></li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value initialized by the production line, <code>0.0</code>, will be used, meaning no threshold is set</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>float</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
<td>The device used for inference. Supports specifying a specific card number.
|
||||
<ul>
|
||||
<li><b>CPU</b>: e.g., <code>cpu</code> indicates using the CPU for inference</li>
|
||||
<li><b>GPU</b>: e.g., <code>gpu:0</code> indicates using the first GPU for inference</li>
|
||||
<li><b>NPU</b>: e.g., <code>npu:0</code> indicates using the first NPU for inference</li>
|
||||
<li><b>XPU</b>: e.g., <code>xpu:0</code> indicates using the first XPU for inference</li>
|
||||
<li><b>MLU</b>: e.g., <code>mlu:0</code> indicates using the first MLU for inference</li>
|
||||
<li><b>DCU</b>: e.g., <code>dcu:0</code> indicates using the first DCU for inference</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value initialized by the production line will be used, which will prioritize using local GPU 0 if available, otherwise it will use the CPU</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>enable_hpi</code></td>
|
||||
<td>Whether to enable high-performance inference.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>False</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_tensorrt</code></td>
|
||||
<td>Whether to use TensorRT for inference acceleration.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>False</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>min_subgraph_size</code></td>
|
||||
<td>Minimum subgraph size, used for optimizing model subgraph computation.</td>
|
||||
<td><code>int</code></td>
|
||||
<td><code>3</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>precision</code></td>
|
||||
<td>Computation precision, such as fp32 or fp16.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>fp32</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>enable_mkldnn</code></td>
|
||||
<td>Whether to enable the MKL-DNN acceleration library. If set to <code>None</code>, it will be enabled by default.</td>
|
||||
<td><code>bool</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>cpu_threads</code></td>
|
||||
<td>The number of threads to use when performing inference on the CPU.</td>
|
||||
<td><code>int</code></td>
|
||||
<td><code>8</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
|
||||
(2) Call the `predict()` method of the seal text recognition production object for inference prediction. This method will return a list of results.
|
||||
|
||||
Additionally, the production line also offers the `predict_iter()` method. Both methods are identical in terms of parameter acceptance and result return; the difference lies in that `predict_iter()` returns a `generator`, allowing for gradual processing and retrieval of prediction results. This is suitable for handling large datasets or scenarios where memory saving is desired. You can choose either method based on your actual needs.
|
||||
|
||||
Below are the parameters for the `predict()` method and their descriptions:
|
||||
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Parameter</th>
|
||||
<th>Description</th>
|
||||
<th>Type</th>
|
||||
<th>Options</th>
|
||||
<th>Default Value</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tr>
|
||||
<td><code>input</code></td>
|
||||
<td>Data to be predicted, supports multiple input types (required)</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>Python Var</b>: Image data represented by <code>numpy.ndarray</code></li>
|
||||
<li><b>str</b>: Local path of an image or PDF file, e.g., <code>/root/data/img.jpg</code>; <b>URL link</b>, e.g., the network URL of an image or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png">Example</a>; <b>Local directory</b>, containing images to be predicted, e.g., <code>/root/data/</code> (currently does not support prediction of PDF files in directories; PDF files must be specified with an exact file path)</li>
|
||||
<li><b>List</b>: Elements of the list must be of the above types, e.g., <code>[numpy.ndarray, numpy.ndarray]</code>, <code>[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]</code>, <code>[\"/root/data1\", \"/root/data2\"]</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
<td>Inference device for the pipeline</td>
|
||||
<td><code>str|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>CPU</b>: e.g., <code>cpu</code> for CPU inference;</li>
|
||||
<li><b>GPU</b>: e.g., <code>gpu:0</code> for inference using the first GPU;</li>
|
||||
<li><b>NPU</b>: e.g., <code>npu:0</code> for inference using the first NPU;</li>
|
||||
<li><b>XPU</b>: e.g., <code>xpu:0</code> for inference using the first XPU;</li>
|
||||
<li><b>MLU</b>: e.g., <code>mlu:0</code> for inference using the first MLU;</li>
|
||||
<li><b>DCU</b>: e.g., <code>dcu:0</code> for inference using the first DCU;</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value from the pipeline initialization will be used. During initialization, the local GPU device 0 will be prioritized; if unavailable, the CPU device will be used.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_orientation_classify</code></td>
|
||||
<td>Whether to use the document orientation classification module</td>
|
||||
<td><code>bool|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value from the pipeline initialization will be used, initialized as <code>True</code>.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_doc_unwarping</code></td>
|
||||
<td>Whether to use the document unwarping module</td>
|
||||
<td><code>bool|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value from the pipeline initialization will be used, initialized as <code>True</code>.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>use_layout_detection</code></td>
|
||||
<td>Whether to use the layout detection module</td>
|
||||
<td><code>bool|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value from the pipeline initialization will be used, initialized as <code>True</code>.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_threshold</code></td>
|
||||
<td>Confidence threshold for layout detection; only scores above this threshold will be output</td>
|
||||
<td><code>float|dict|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>float</b>: Any float greater than <code>0</code></li>
|
||||
<li><b>dict</b>: Key is the int category ID, value is any float greater than <code>0</code></li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value from the pipeline initialization will be used, initialized as <code>0.5</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_nms</code></td>
|
||||
<td>Whether to use Non-Maximum Suppression (NMS) for layout detection post-processing</td>
|
||||
<td><code>bool|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>bool</b>: <code>True</code> or <code>False</code>;</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value from the pipeline initialization will be used, initialized as <code>True</code>.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_unclip_ratio</code></td>
|
||||
<td>Expansion ratio of detection box edges; if not specified, the default value from the PaddleX official model configuration will be used</td>
|
||||
<td><code>float|list|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>float</b>: Any float greater than 0, e.g., 1.1, which means expanding the width and height of the detection box by 1.1 times while keeping the center unchanged</li>
|
||||
<li><b>list</b>: e.g., [1.2, 1.5], which means expanding the width of the detection box by 1.2 times and the height by 1.5 times while keeping the center unchanged</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value from the pipeline initialization will be used, initialized as 1.0</li>
|
||||
</ul>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_merge_bboxes_mode</code></td>
|
||||
<td>Merging mode for detection boxes in layout detection output; if not specified, the default value from the PaddleX official model configuration will be used</td>
|
||||
<td><code>string|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>large</b>: When set to <code>large</code>, only the largest external box will be retained for overlapping detection boxes, and the internal overlapping boxes will be removed.</li>
|
||||
<li><b>small</b>: When set to <code>small</code>, only the smallest internal box will be retained for overlapping detection boxes, and the external overlapping boxes will be removed.</li>
|
||||
<li><b>union</b>: No filtering of boxes will be performed; both internal and external boxes will be retained.</li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value from the pipeline initialization will be used, initialized as <code>large</code>.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td>None</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_det_limit_side_len</code></td>
|
||||
<td>Side length limit for seal text detection</td>
|
||||
<td><code>int|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>int</b>: Any integer greater than <code>0</code></li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value from the pipeline initialization will be used, initialized as <code>736</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>seal_rec_score_thresh</code></td>
|
||||
<td>Text recognition threshold; text results with scores above this threshold will be retained</td>
|
||||
<td><code>float|None</code></td>
|
||||
<td>
|
||||
<ul>
|
||||
<li><b>float</b>: Any float greater than <code>0</code></li>
|
||||
<li><b>None</b>: If set to <code>None</code>, the default value from the pipeline initialization will be used, initialized as <code>0.0</code>. This means no threshold is applied.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
(3) Process the prediction results. The prediction result for each sample is of `dict` type and supports operations such as printing, saving as an image, and saving as a `json` file:
|
||||
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Method</th>
|
||||
<th>Description</th>
|
||||
<th>Parameter</th>
|
||||
<th>Parameter Type</th>
|
||||
<th>Parameter Description</th>
|
||||
<th>Default Value</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tr>
|
||||
<td rowspan="3"><code>print()</code></td>
|
||||
<td rowspan="3">Print results to the terminal</td>
|
||||
<td><code>format_json</code></td>
|
||||
<td><code>bool</code></td>
|
||||
<td>Whether to format the output content using <code>JSON</code> indentation</td>
|
||||
<td><code>True</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>indent</code></td>
|
||||
<td><code>int</code></td>
|
||||
<td>Specify the indentation level to beautify the output <code>JSON</code> data for better readability, effective only when <code>format_json</code> is <code>True</code></td>
|
||||
<td>4</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>ensure_ascii</code></td>
|
||||
<td><code>bool</code></td>
|
||||
<td>Control whether to escape non-<code>ASCII</code> characters to <code>Unicode</code>. When set to <code>True</code>, all non-<code>ASCII</code> characters will be escaped; <code>False</code> will retain the original characters, effective only when <code>format_json</code> is <code>True</code></td>
|
||||
<td><code>False</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="3"><code>save_to_json()</code></td>
|
||||
<td rowspan="3">Save results as a json file</td>
|
||||
<td><code>save_path</code></td>
|
||||
<td><code>str</code></td>
|
||||
<td>The file path to save the results. When it is a directory, the saved file name will be consistent with the input file type</td>
|
||||
<td>None</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>indent</code></td>
|
||||
<td><code>int</code></td>
|
||||
<td>Specify the indentation level to beautify the output <code>JSON</code> data for better readability, effective only when <code>format_json</code> is <code>True</code></td>
|
||||
<td>4</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>ensure_ascii</code></td>
|
||||
<td><code>bool</code></td>
|
||||
<td>Control whether to escape non-<code>ASCII</code> characters to <code>Unicode</code>. When set to <code>True</code>, all non-<code>ASCII</code> characters will be escaped; <code>False</code> will retain the original characters, effective only when <code>format_json</code> is <code>True</code></td>
|
||||
<td><code>False</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>save_to_img()</code></td>
|
||||
<td>Save results as an image file</td>
|
||||
<td><code>save_path</code></td>
|
||||
<td><code>str</code></td>
|
||||
<td>The file path to save the results, supports directory or file path</td>
|
||||
<td>None</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<b>Note:</b> The parameters in the configuration file are the pipeline initialization parameters. If you wish to change the initialization parameters of the seal text recognition pipeline, you can directly modify the parameters in the configuration file and load the configuration file for prediction. Additionally, CLI prediction also supports passing in a configuration file. Simply specify the path of the configuration file with `--pipeline`.
|
||||
|
||||
- Calling the `print()` method will print the results to the terminal, and the explanations of the printed content are as follows:
|
||||
|
||||
- `input_path`: `(str)` The input path of the image to be predicted.
|
||||
|
||||
- `model_settings`: `(Dict[str, bool])` The model parameters required for pipeline configuration.
|
||||
|
||||
- `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing sub-pipeline.
|
||||
- `use_layout_detection`: `(bool)` Controls whether to enable the layout detection sub-module.
|
||||
|
||||
- `layout_det_res`: `(Dict[str, Union[List[numpy.ndarray], List[float]]])` The output result of the layout detection sub-module. Only exists when `use_layout_detection=True`.
|
||||
|
||||
- `input_path`: `(Union[str, None])` The image path accepted by the layout detection module. Saved as `None` when the input is a `numpy.ndarray`.
|
||||
- `page_index`: `(Union[int, None])` Indicates the current page number of the PDF if the input is a PDF file; otherwise, it is `None`.
|
||||
- `boxes`: `(List[Dict])` A list of detected layout seal regions, with each element containing the following fields:
|
||||
- `cls_id`: `(int)` The class ID of the detected seal region.
|
||||
- `score`: `(float)` The confidence score of the detected region.
|
||||
- `coordinate`: `(List[float])` The coordinates of the four corners of the detection box, in the order of x1, y1, x2, y2, representing the x-coordinate of the top-left corner, the y-coordinate of the top-left corner, the x-coordinate of the bottom-right corner, and the y-coordinate of the bottom-right corner.
|
||||
|
||||
- `seal_res_list`: `List[Dict]` A list of seal text recognition results, with each element containing the following fields:
|
||||
|
||||
- `input_path`: `(Union[str, None])` The image path accepted by the seal text recognition pipeline. Saved as `None` when the input is a `numpy.ndarray`.
|
||||
- `page_index`: `(Union[int, None])` Indicates the current page number of the PDF if the input is a PDF file; otherwise, it is `None`.
|
||||
- `model_settings`: `(Dict[str, bool])` The model configuration parameters for the seal text recognition pipeline.
|
||||
- `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing sub-pipeline.
|
||||
- `use_textline_orientation`: `(bool)` Controls whether to enable the text line orientation classification sub-module.
|
||||
|
||||
- `doc_preprocessor_res`: `(Dict[str, Union[str, Dict[str, bool], int]])` The output result of the document preprocessing sub-pipeline. Only exists when `use_doc_preprocessor=True`.
|
||||
|
||||
- `input_path`: `(Union[str, None])` The image path accepted by the document preprocessing sub-pipeline. Saved as `None` when the input is a `numpy.ndarray`.
|
||||
- `model_settings`: `(Dict)` The model configuration parameters for the preprocessing sub-pipeline.
|
||||
- `use_doc_orientation_classify`: `(bool)` Controls whether to enable document orientation classification.
|
||||
- `use_doc_unwarping`: `(bool)` Controls whether to enable document unwarping.
|
||||
- `angle`: `(int)` The predicted result of document orientation classification. When enabled, it takes values [0, 1, 2, 3], corresponding to [0°, 90°, 180°, 270°]; when disabled, it is -1.
|
||||
|
||||
- `dt_polys`: `(List[numpy.ndarray])` A list of polygon boxes for seal text detection. Each detection box is represented by a numpy array of multiple vertex coordinates, with the array shape being (n, 2).
|
||||
|
||||
- `dt_scores`: `(List[float])` A list of confidence scores for text detection boxes.
|
||||
|
||||
- `text_det_params`: `(Dict[str, Dict[str, int, float]])` Configuration parameters for the text detection module.
|
||||
- `limit_side_len`: `(int)` The side length limit value during image preprocessing.
|
||||
- `limit_type`: `(str)` The handling method for side length limits.
|
||||
- `thresh`: `(float)` The confidence threshold for text pixel classification.
|
||||
- `box_thresh`: `(float)` The confidence threshold for text detection boxes.
|
||||
- `unclip_ratio`: `(float)` The expansion ratio for text detection boxes.
|
||||
- `text_type`: `(str)` The type of seal text detection, currently fixed as "seal".
|
||||
|
||||
- `text_rec_score_thresh`: `(float)` The filtering threshold for text recognition results.
|
||||
|
||||
- `rec_texts`: `(List[str])` A list of text recognition results, containing only texts with confidence scores above `text_rec_score_thresh`.
|
||||
|
||||
- `rec_scores`: `(List[float])` A list of confidence scores for text recognition, filtered by `text_rec_score_thresh`.
|
||||
|
||||
- `rec_polys`: `(List[numpy.ndarray])` A list of text detection boxes filtered by confidence score, in the same format as `dt_polys`.
|
||||
|
||||
- `rec_boxes`: `(numpy.ndarray)` An array of rectangular bounding boxes for detection boxes; the seal recognition pipeline returns an empty array.
|
||||
|
||||
- Calling the `save_to_json()` method will save the above content to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}_res.json`. If a file is specified, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, `numpy.array` types will be converted to list format.
|
||||
|
||||
- Calling the `save_to_img()` method will save the visualization results to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}_seal_res_region1.{your_img_extension}`. If a file is specified, it will be saved directly to that file. (The pipeline usually contains multiple result images, so it is not recommended to specify a specific file path directly, as multiple images will be overwritten, and only the last image will be retained.)
|
||||
|
||||
* Additionally, you can obtain visualized images with results and prediction results through attributes, as follows:
|
||||
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Attribute</th>
|
||||
<th>Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tr>
|
||||
<td rowspan="1"><code>json</code></td>
|
||||
<td rowspan="1">Get the prediction results in <code>json</code> format.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2"><code>img</code></td>
|
||||
<td rowspan="2">Get the visualization results in <code>dict</code> format.</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
- The prediction results obtained through the `json` attribute are of dict type, with content consistent with what is saved by calling the `save_to_json()` method.
|
||||
- The prediction results returned by the `img` attribute are of dict type. The keys are `layout_det_res`, `seal_res_region1`, and `preprocessed_img`, corresponding to three `Image.Image` objects: one for visualizing layout detection, one for visualizing seal text recognition results, and one for visualizing image preprocessing. If the image preprocessing sub-module is not used, `preprocessed_img` will not be included in the dictionary. If the layout region detection module is not used, `layout_det_res` will not be included.
|
||||
|
||||
|
||||
## 3. Development Integration/Deployment
|
||||
If the pipeline meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment.
|
||||
|
||||
|
||||
@ -690,7 +690,7 @@ paddleocr seal_recognition -i ./seal_text_det.png --device gpu
|
||||
<td>待预测数据,支持多种输入类型,必填。
|
||||
<ul>
|
||||
<li><b>Python Var</b>:如 <code>numpy.ndarray</code> 表示的图像数据</li>
|
||||
<li><b>str</b>:如图像文件或者PDF文件的本地路径:<code>/root/data/img.jpg</code>;<b>如URL链接</b>,如图像文件或PDF文件的网络URL:<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_doc_preprocessor_002.png">示例</a>;<b>如本地目录</b>,该目录下需包含待预测图像,如本地路径:<code>/root/data/</code>(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)</li>
|
||||
<li><b>str</b>:如图像文件或者PDF文件的本地路径:<code>/root/data/img.jpg</code>;<b>如URL链接</b>,如图像文件或PDF文件的网络URL:<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png">示例</a>;<b>如本地目录</b>,该目录下需包含待预测图像,如本地路径:<code>/root/data/</code>(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)</li>
|
||||
<li><b>List</b>:列表元素需为上述类型数据,如<code>[numpy.ndarray, numpy.ndarray]</code>,<code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>,<code>["/root/data1", "/root/data2"]</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
|
||||
@ -822,6 +822,24 @@ paddleocr table_recognition_v2 -i ./general_formula_recognition_001.png --device
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><code>input</code></td>
|
||||
<td>Data to be predicted, supports multiple input types, required.
|
||||
<ul>
|
||||
<li><b>Python Var</b>: For example, image data represented as <code>numpy.ndarray</code>.</li>
|
||||
<li><b>str</b>: Local path to image files or PDF files: <code>/root/data/img.jpg</code>; <b>as URL links</b>, such as network URLs for image files or PDF files: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_doc_preprocessor_002.png">example</a>; <b>as local directories</b>, the directory must contain images to be predicted, such as local path: <code>/root/data/</code> (currently, predictions do not support directories that contain PDF files; the PDF file must be specified to the specific file path).</li>
|
||||
<li><b>List</b>: The elements of the list must be of the above types, such as <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"]</code>.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>save_path</code></td>
|
||||
<td>Specify the path to save the inference result file. If set to <code>None</code>, the inference result will not be saved locally.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>layout_detection_model_name</code></td>
|
||||
<td>Name of the layout detection model. If set to <code>None</code>, the default model of the pipeline will be used.</td>
|
||||
<td><code>str</code></td>
|
||||
@ -1038,24 +1056,6 @@ paddleocr table_recognition_v2 -i ./general_formula_recognition_001.png --device
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>input</code></td>
|
||||
<td>Data to be predicted, supports multiple input types, required.
|
||||
<ul>
|
||||
<li><b>Python Var</b>: For example, image data represented as <code>numpy.ndarray</code>.</li>
|
||||
<li><b>str</b>: Local path to image files or PDF files: <code>/root/data/img.jpg</code>; <b>as URL links</b>, such as network URLs for image files or PDF files: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_doc_preprocessor_002.png">example</a>; <b>as local directories</b>, the directory must contain images to be predicted, such as local path: <code>/root/data/</code> (currently, predictions do not support directories that contain PDF files; the PDF file must be specified to the specific file path).</li>
|
||||
<li><b>List</b>: The elements of the list must be of the above types, such as <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"]</code>.</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>save_path</code></td>
|
||||
<td>Specify the path to save the inference result file. If set to <code>None</code>, the inference result will not be saved locally.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
<td>The device used for inference. Supports specifying a specific card number.
|
||||
<ul>
|
||||
@ -1107,6 +1107,12 @@ paddleocr table_recognition_v2 -i ./general_formula_recognition_001.png --device
|
||||
<td><code>int</code></td>
|
||||
<td><code>8</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>paddlex_config</code></td>
|
||||
<td>Path to PaddleX pipeline configuration file.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</details>
|
||||
@ -1428,6 +1434,12 @@ In the above Python script, the following steps are performed:
|
||||
<td><code>int</code></td>
|
||||
<td><code>8</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>paddlex_config</code></td>
|
||||
<td>Path to PaddleX pipeline configuration file.</td>
|
||||
<td><code>str</code></td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
@ -1457,7 +1469,7 @@ The parameters and descriptions of the `predict()` method are as follows:
|
||||
</ul>
|
||||
</td>
|
||||
<td><code>Python Var|str|list</code></td>
|
||||
<td><code>None</code></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>device</code></td>
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user