unstructured
0506aff7 - add support for `start_index` in `html` links extraction (#2600)

Commit

1 year ago

add support for `start_index` in `html` links extraction (#2600) add support for start_index in html links extraction (closes #2625) Testing ``` from unstructured.partition.html import partition_html from unstructured.staging.base import elements_to_json html_text = """<html> <p>Hello there I am a <a href="/link">very important link!</a></p> <p>Here is a list of my favorite things</p> <ul> <li><a href="https://en.wikipedia.org/wiki/Parrot">Parrots</a></li> <li>Dogs</li> </ul> <a href="/loner">A lone link!</a> </html>""" elements = partition_html(text=html_text) print(elements_to_json(elements)) ``` --------- Co-authored-by: Michael Niestroj <michael.niestroj@unblu.com> Co-authored-by: christinestraub <christinemstraub@gmail.com> Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: christinestraub <christinestraub@users.noreply.github.com> Co-authored-by: Ronny H <138828701+ron-unstructured@users.noreply.github.com>

References

#2600 - add support for `start_index` in `html` links extraction

Author

MiXiBo

Parents

3e643c4c

unstructured 0506aff7 - add support for `start_index` in `html` links extraction (#2600)

unstructured
0506aff7 - add support for `start_index` in `html` links extraction (#2600)