update doc (#15331)

2025-11-03 03:09:16 +00:00 · 2025-05-26 20:34:25 +08:00 · 2025-05-26 20:34:25 +08:00 · b0b31c38ae
commit b0b31c38ae
parent 0292ba5166
4 changed files with 42 additions and 33 deletions
--- a/README.md
+++ b/README.md
@ -156,14 +156,6 @@ retriever_config = {
    "api_key": "api_key",  # your api_key
 }

-mllm_chat_bot_config = {
-    "module_name": "chat_bot",
-    "model_name": "PP-DocBee",
-    "base_url": "http://127.0.0.1:8080/",  # your local mllm service url
-    "api_type": "openai",
-    "api_key": "api_key",  # your api_key
-}
-
 pipeline = PPChatOCRv4Doc(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False
@ -176,6 +168,25 @@ visual_predict_res = pipeline.visual_predict(
    use_table_recognition=True,
 )

+mllm_predict_info = None
+use_mllm = False
+# 如果使用多模态大模型，需要启动本地 mllm 服务，可以参考文档：https://github.com/PaddlePaddle/PaddleX/blob/release/3.0/docs/pipeline_usage/tutorials/vlm_pipelines/doc_understanding.md 进行部署，并更新 mllm_chat_bot_config 配置。
+if use_mllm:
+    mllm_chat_bot_config = {
+        "module_name": "chat_bot",
+        "model_name": "PP-DocBee",
+        "base_url": "http://127.0.0.1:8080/",  # your local mllm service url
+        "api_type": "openai",
+        "api_key": "api_key",  # your api_key
+    }
+
+    mllm_predict_res = pipeline.mllm_pred(
+        input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png",
+        key_list=["驾驶室准乘人数"],
+        mllm_chat_bot_config=mllm_chat_bot_config,
+    )
+    mllm_predict_info = mllm_predict_res["mllm_res"]
+
 visual_info_list = []
 for res in visual_predict_res:
    visual_info_list.append(res["visual_info"])
@ -184,12 +195,6 @@ for res in visual_predict_res:
 vector_info = pipeline.build_vector(
    visual_info_list, flag_save_bytes_vector=True, retriever_config=retriever_config
 )
-mllm_predict_res = pipeline.mllm_pred(
-    input="vehicle_certificate-1.png",
-    key_list=["驾驶室准乘人数"],
-    mllm_chat_bot_config=mllm_chat_bot_config,
-)
-mllm_predict_info = mllm_predict_res["mllm_res"]
 chat_result = pipeline.chat(
    key_list=["驾驶室准乘人数"],
    visual_info=visual_info_list,
--- a/README_en.md
+++ b/README_en.md
@ -172,17 +172,10 @@ retriever_config = {
    "api_key": "api_key",  # your api_key
 }

-mllm_chat_bot_config = {
-    "module_name": "chat_bot",
-    "model_name": "PP-DocBee",
-    "base_url": "http://127.0.0.1:8080/",  # your local mllm service url
-    "api_type": "openai",
-    "api_key": "api_key",  # your api_key
-}
-
 pipeline = PPChatOCRv4Doc(
    use_doc_orientation_classify=False,
-    use_doc_unwarping=False)
+    use_doc_unwarping=False
+)

 visual_predict_res = pipeline.visual_predict(
    input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png",
@ -191,6 +184,25 @@ visual_predict_res = pipeline.visual_predict(
    use_table_recognition=True,
 )

+mllm_predict_info = None
+use_mllm = False
+# If a multimodal large model is used, the local mllm service needs to be started. You can refer to the documentation: https://github.com/PaddlePaddle/PaddleX/blob/release/3.0/docs/pipeline_usage/tutorials/vlm_pipelines/doc_understanding.m d performs deployment and updates the mllm_chat_bot_config configuration.
+if use_mllm:
+    mllm_chat_bot_config = {
+        "module_name": "chat_bot",
+        "model_name": "PP-DocBee",
+        "base_url": "http://127.0.0.1:8080/",  # your local mllm service url
+        "api_type": "openai",
+        "api_key": "api_key",  # your api_key
+    }
+
+    mllm_predict_res = pipeline.mllm_pred(
+        input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png",
+        key_list=["驾驶室准乘人数"],
+        mllm_chat_bot_config=mllm_chat_bot_config,
+    )
+    mllm_predict_info = mllm_predict_res["mllm_res"]
+
 visual_info_list = []
 for res in visual_predict_res:
    visual_info_list.append(res["visual_info"])
@ -199,12 +211,6 @@ for res in visual_predict_res:
 vector_info = pipeline.build_vector(
    visual_info_list, flag_save_bytes_vector=True, retriever_config=retriever_config
 )
-mllm_predict_res = pipeline.mllm_pred(
-    input="vehicle_certificate-1.png",
-    key_list=["驾驶室准乘人数"],
-    mllm_chat_bot_config=mllm_chat_bot_config,
-)
-mllm_predict_info = mllm_predict_res["mllm_res"]
 chat_result = pipeline.chat(
    key_list=["驾驶室准乘人数"],
    visual_info=visual_info_list,
--- a/docs/version3.x/pipeline_usage/PP-StructureV3.en.md
+++ b/docs/version3.x/pipeline_usage/PP-StructureV3.en.md
@ -6,7 +6,7 @@ comments: true

 ## 1. Introduction to PP-StructureV3 Production Line

-Layout analysis is a technique used to extract structured information from document images. It is primarily used to convert complex document layouts into machine-readable data formats. This technology has broad applications in document management, information extraction, and data digitization. Layout analysis combines Optical Character Recognition (OCR), image processing, and machine learning algorithms to identify and extract text blocks, titles, paragraphs, images, tables, and other layout elements from documents. This process generally includes three main steps: layout analysis, element analysis, and data formatting. The final result is structured document data, which enhances the efficiency and accuracy of data processing. <b>PP-StructureV3 improves upon the general layout analysis v1 production line by enhancing layout region detection, table recognition, and formula recognition. It also adds capabilities such as multi-column reading order recovery and result conversion to Markdown files. It performs excellently across various document types and can handle complex document data.<b>  This production line also provides flexible service deployment options, supporting invocation using multiple programming languages on various hardware. In addition, it offers secondary development capabilities, allowing you to train and fine-tune models on your own dataset and integrate the trained models seamlessly.
+Layout analysis is a technique used to extract structured information from document images. It is primarily used to convert complex document layouts into machine-readable data formats. This technology has broad applications in document management, information extraction, and data digitization. Layout analysis combines Optical Character Recognition (OCR), image processing, and machine learning algorithms to identify and extract text blocks, titles, paragraphs, images, tables, and other layout elements from documents. This process generally includes three main steps: layout analysis, element analysis, and data formatting. The final result is structured document data, which enhances the efficiency and accuracy of data processing. <b>PP-StructureV3 improves upon the general layout analysis v1 production line by enhancing layout region detection, table recognition, and formula recognition. It also adds capabilities such as multi-column reading order recovery, chart understanding, and result conversion to Markdown files. It performs excellently across various document types and can handle complex document data.<b>  This production line also provides flexible service deployment options, supporting invocation using multiple programming languages on various hardware. In addition, it offers secondary development capabilities, allowing you to train and fine-tune models on your own dataset and integrate the trained models seamlessly.

 <b>PP-StructureV3 includes the following six modules. Each module can be independently trained and inferred, and contains multiple models. Click the corresponding module for more documentation.<b>

@ -16,7 +16,6 @@ Layout analysis is a technique used to extract structured information from docum
 - [Table Recognition Subline ](./table_recognition_v2.en.md) （Optional）
 - [Seal Recognition Subline](./seal_recognition.en.md) （Optional）
 - [Formula Recognition Subline](./formula_recognition.en.md) （Optional）
- [Chart Parsing Module ]() （Optional）

 In this pipeline, you can choose the model to use based on the benchmark data below.

--- a/docs/version3.x/pipeline_usage/PP-StructureV3.md
+++ b/docs/version3.x/pipeline_usage/PP-StructureV3.md
@ -6,7 +6,7 @@ comments: true

 ## 1. PP-StructureV3 产线介绍

-版面解析是一种从文档图像中提取结构化信息的技术，主要用于将复杂的文档版面转换为机器可读的数据格式。这项技术在文档管理、信息提取和数据数字化等领域具有广泛的应用。版面解析通过结合光学字符识别（OCR）、图像处理和机器学习算法，能够识别和提取文档中的文本块、标题、段落、图片、表格以及其他版面元素。此过程通常包括版面分析、元素分析和数据格式化三个主要步骤，最终生成结构化的文档数据，提升数据处理的效率和准确性。<b>PP-StructureV3 产线在通用版面解析v1产线的基础上，强化了版面区域检测、表格识别、公式识别的能力，增加了多栏阅读顺序的恢复能力、结果转换 Markdown 文件的能力，在多种文档数据中，表现优异，可以处理较复杂的文档数据。</b>本产线同时提供了灵活的服务化部署方式，支持在多种硬件上使用多种编程语言调用。不仅如此，本产线也提供了二次开发的能力，您可以基于本产线在您自己的数据集上训练调优，训练后的模型也可以无缝集成。
+版面解析是一种从文档图像中提取结构化信息的技术，主要用于将复杂的文档版面转换为机器可读的数据格式。这项技术在文档管理、信息提取和数据数字化等领域具有广泛的应用。版面解析通过结合光学字符识别（OCR）、图像处理和机器学习算法，能够识别和提取文档中的文本块、标题、段落、图片、表格以及其他版面元素。此过程通常包括版面分析、元素分析和数据格式化三个主要步骤，最终生成结构化的文档数据，提升数据处理的效率和准确性。<b>PP-StructureV3 产线在通用版面解析v1产线的基础上，强化了版面区域检测、表格识别、公式识别的能力，增加了图表理解能力和多栏阅读顺序的恢复能力、结果转换 Markdown 文件的能力，在多种文档数据中，表现优异，可以处理较复杂的文档数据。</b>本产线同时提供了灵活的服务化部署方式，支持在多种硬件上使用多种编程语言调用。不仅如此，本产线也提供了二次开发的能力，您可以基于本产线在您自己的数据集上训练调优，训练后的模型也可以无缝集成。

 <b>PP-StructureV3 产线中包含以下6个模块。每个模块均可独立进行训练和推理，并包含多个模型。有关详细信息，请点击相应模块以查看文档。</b>

@ -16,7 +16,6 @@ comments: true
 - [表格识别子产线](./table_recognition_v2.md) （可选）
 - [印章识别子产线](./seal_recognition.md) （可选）
 - [公式识别子产线](./formula_recognition.md) （可选）
- [图表解析模块]() （可选）

 在本产线中，您可以根据下方的基准测试数据选择使用的模型。