unstructured
432d209c - fix(file): confirm or correct asserted DOCX, PPTX, and XLSX content types (#3434)

Commit
1 year ago
fix(file): confirm or correct asserted DOCX, PPTX, and XLSX content types (#3434) **Summary** The `content_type` argument received by `partition()` from the API is sometimes unreliable for MS-Office 2007+ MIME-types. What we've observed is that it gets the MS-Office bit right but falls down on distinguishing PPTX from DOCX or XLSX. Confirmation of these types is simple, fast, and reliable. Confirm all MS-Office `content_type` argument values asserted by callers of `detect_filetype()` and correct swapped values.
Author
Parents
Loading