unstructured/test_unstructured
Steve Canny a66661a7bf
rfctr(html): drop now dead XMLDocument and Document (#3165)
**Summary**
`HTMLDocument` is the class handling the core of HTML parsing. This is
critical code because 8 of the 20 file-type partitioners end up using
this code (`partition_html()` + 7 brokering partitioners like EPUB, MD,
and RST).

For historical reasons, `HTMLDocument` subclassed `XMLDocument` which in
turn subclassed `Document`, both of which are no longer relevant and
unnecessarily complicate reasoning about `HTMLDocument` behavior.

Remove that inheritance and dependency and drop both `XMLDocument` and
`Document` modules which become dead code after no longer being used by
`HTMLDocument`.
2024-06-08 07:36:18 +00:00
..