mirror of
				https://github.com/infiniflow/ragflow.git
				synced 2025-10-25 23:09:20 +00:00 
			
		
		
		
	 72384b191d
			
		
	
	
		72384b191d
		
			
		
	
	
	
	
		
			
			### What problem does this PR solve?
Add `.doc` file parser, using tika.
```
pip install tika
```
```
from tika import parser
from io import BytesIO
def extract_text_from_doc_bytes(doc_bytes):
    file_like_object = BytesIO(doc_bytes)
    parsed = parser.from_buffer(file_like_object)
    return parsed["content"]
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: chrysanthemum-boy <fannc@qq.com>