unstructured
3240e3d1 - rfctr(pptx): minify HTML and table.text is cct (#3734)

Commit
1 year ago
rfctr(pptx): minify HTML and table.text is cct (#3734) **Summary** Eliminate historical "idiosyncracies" of `table.metadata.text_as_html` HTML introduced by `partition_pptx()`. Produce minified `.text_as_html` consistent with that formed by chunking. **Additional Context** - PPTX `.metadata.text_as_html` is minified (no extra whitespace or thead, tbody, tfoot elements). - `table.text` is clean-concatenated-text (CCT) of table. - Last use of `tabulate` library is removed and that dependency is removed from `base.in`.
Author
Parents
Loading