unstructured
a66661a7 - rfctr(html): drop now dead XMLDocument and Document (#3165)

Commit
1 year ago
rfctr(html): drop now dead XMLDocument and Document (#3165) **Summary** `HTMLDocument` is the class handling the core of HTML parsing. This is critical code because 8 of the 20 file-type partitioners end up using this code (`partition_html()` + 7 brokering partitioners like EPUB, MD, and RST). For historical reasons, `HTMLDocument` subclassed `XMLDocument` which in turn subclassed `Document`, both of which are no longer relevant and unnecessarily complicate reasoning about `HTMLDocument` behavior. Remove that inheritance and dependency and drop both `XMLDocument` and `Document` modules which become dead code after no longer being used by `HTMLDocument`.
Author
Parents
Loading