fix: xml processing not escaped (#4034)
`<?xml version="1.0"?>` does not get escaped when converting to html, in
a code block like this in the markdown file
````
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
<head></head>
<boolean>true</boolean>
</sparql>
````
which causes the parser to throw error like
> AttributeError: 'lxml.etree._ProcessingInstruction' object has no
attribute 'is_phrasing'.
This PR processes the original md file and add indentation to `<?xml
version="1.0"?>` to force the xml code to be escaped when being
converted to html
https://github.com/Unstructured-IO/unstructured/issues/3935