mirror of
https://github.com/infiniflow/ragflow.git
synced 2025-06-26 22:19:57 +00:00
parent
7fd1eca582
commit
17d751d2d1
@ -1,8 +1,6 @@
|
||||
English | [简体中文](./README_zh.md)
|
||||
|
||||
#*Deep*Doc
|
||||
|
||||
---
|
||||
# *Deep*Doc
|
||||
|
||||
- [1. Introduction](#1)
|
||||
- [2. Vision](#2)
|
||||
@ -11,7 +9,6 @@ English | [简体中文](./README_zh.md)
|
||||
<a name="1"></a>
|
||||
## 1. Introduction
|
||||
|
||||
---
|
||||
With a bunch of documents from various domains with various formats and along with diverse retrieval requirements,
|
||||
an accurate analysis becomes a very challenge task. *Deep*Doc is born for that purpose.
|
||||
There 2 parts in *Deep*Doc so far: vision and parser.
|
||||
@ -19,8 +16,6 @@ There 2 parts in *Deep*Doc so far: vision and parser.
|
||||
<a name="2"></a>
|
||||
## 2. Vision
|
||||
|
||||
---
|
||||
|
||||
We use vision information to resolve problems as human being.
|
||||
- OCR. Since a lot of documents presented as images or at least be able to transform to image,
|
||||
OCR is a very essential and fundamental or even universal solution for text extraction.
|
||||
@ -64,19 +59,16 @@ We use vision information to resolve problems as human being.
|
||||
<a name="3"></a>
|
||||
## 3. Parser
|
||||
|
||||
---
|
||||
|
||||
Four kinds of document formats as PDF, DOCX, EXCEL and PPT have their corresponding parser.
|
||||
The most complex one is PDF parser since PDF's flexibility. The output of PDF parser includes:
|
||||
- Text chunks with their own positions in PDF(page number and rectangular positions).
|
||||
- Tables with cropped image from the PDF, and contents which has already translated into natural language sentences.
|
||||
- Figures with caption and text in the figures.
|
||||
|
||||
###Résumé
|
||||
### Résumé
|
||||
|
||||
---
|
||||
The résumé is a very complicated kind of document. A résumé which is composed of unstructured text
|
||||
with various layouts could be resolved into structured data composed of nearly a hundred of fields.
|
||||
We haven't opened the parser yet, as we open the processing method after parsing procedure.
|
||||
|
||||
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user