unstructured
c85f29e6 - fix(xlsx): XLSX emits std minified .text_as_html (#3558)

Commit
1 year ago
fix(xlsx): XLSX emits std minified .text_as_html (#3558) **Summary** Eliminate historical "idiosyncracies" of `table.metadata.text_as_html` HTML introduced by `partition_xlsx()`. Produce minified `.text_as_html` consistent with that formed by chunking. **Additional Context** - XLSX `.text_as_html` is minified (no extra whitespace or thead, tbody, tfoot elements). - `table.text` is clean-concatenated-text (CCT) of table. --------- Co-authored-by: scanny <scanny@users.noreply.github.com>
Author
Parents
Loading