mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-10-02 11:52:25 +00:00

A DOCX document that has no sections can still contain one or more tables. Such files are never created by Word but Word can open them just fine. These can be and are generated by other applications. Use the newly-added `Document.iter_inner_content()` method added upstream in `python-docx` to capture both paragraphs and tables from a section-less DOCX document. This generalizes the fix for MS Teams chat-transcripts (an example of sectionless-docx) implemented in #1825.
14 lines
409 B
Python
14 lines
409 B
Python
from typing import Iterator, Sequence
|
|
|
|
from docx.oxml.xmlchemy import BaseOxmlElement
|
|
from docx.table import Table
|
|
from docx.text.paragraph import Paragraph
|
|
|
|
class BlockItemContainer:
|
|
_element: BaseOxmlElement
|
|
def iter_inner_content(self) -> Iterator[Paragraph | Table]: ...
|
|
@property
|
|
def paragraphs(self) -> Sequence[Paragraph]: ...
|
|
@property
|
|
def tables(self) -> Sequence[Table]: ...
|