unstructured
c578b856 - fix: respect `<pre>` tag order in `partition_html` (#1197)

Commit
2 years ago
fix: respect `<pre>` tag order in `partition_html` (#1197) ### Summary Closes #1184. Updates `partition_html` to respect the ordering of `<pre>` tags in HTML documents. ### Testing The elements in the following example should be in the correct order. ```python from unstructured.partition.html import partition_html html_text = """ <pre>The Big Brown Bear</pre> <div>The big brown bear is growling.</div> <pre>The big brown bear is sleeping.</pre> <div>The Big Blue Bear</div> """ elements = partition_html(text=html_text) print("\n\n".join([str(el) for el in elements])) ```
Author
Parents
Loading