fix doctype parsing error (#3811)
- per [ticket](https://unstructured-ai.atlassian.net/browse/ML-551),
there is a bug in the `unstructured` lib under metrics/evaluate.py that
incorrectly retrieves the file extension before the conversion to cct
file from paths like '*.pdf.txt' . (see below screenshot)
- the current status is in the top example
- we should have the correct version in the bottom example of the
screenshot.

- in addition, i also observe the doctype returned are not aligned, some
returning '.*' and some are returning without the dot.
- therefore, i just aligned them to be output into the same version
which is '.*".