unstructured
add support for `start_index` in `html` links extraction
#2600
Merged

add support for `start_index` in `html` links extraction #2600

MiXiBo
michaelniestroj-bit Refactor threshold to annotation_threshold and make it an optional pa…
724cdb52
MiXiBo Merge branch 'main' into main
643e67ea
MiXiBo Merge branch 'main' into main
34c78de8
MiXiBo Merge branch 'Unstructured-IO:main' into main
6c34d9fa
michaelniestroj-bit add support for start_index to html link extraction
f4a18b54
MiXiBo
christinestraub
MiXiBo
MiXiBo Merge branch 'Unstructured-IO:main' into improved_pdf_html_links_support
ff2f3bfb
MiXiBo MiXiBo changed the title Improved pdf html links support add support for `start_index` in `html` links extraction 1 year ago
michaelniestroj-bit Revert "Refactor threshold to annotation_threshold and make it an opt…
4f99d67b
MiXiBo Merge branch 'main' into improved_pdf_html_links_support
0c1e2c14
MiXiBo
MiXiBo Merge branch 'main' into improved_pdf_html_links_support
8c12ca71
christinestraub Merge branch 'main' into mixibo/improved_pdf_html_links_support
99f3545e
christinestraub chore: update changelog & version
e75b43d1
christinestraub feat: fix TypeError: object of type 'NoneType' has no len()
3be94525
christinestraub test: add unit test
9c13c984
christinestraub refactor: rename link_start_indexs -> link_start_indexes
70f2a879
christinestraub christinestraub requested a review from Klaijan Klaijan 1 year ago
christinestraub christinestraub requested a review from MthwRobinson MthwRobinson 1 year ago
christinestraub christinestraub requested a review from cragwolfe cragwolfe 1 year ago
ryannikolaidis [DO NOT MERGE] feat: add support for start_index in html links extrac…
87379817
christinestraub test: add unit test to test partition_html() with links
f7623e1a
christinestraub test: fix lint error
62f15a29
christinestraub Merge branch 'main' into mixibo/improved_pdf_html_links_support
2079213d
MthwRobinson
MthwRobinson approved these changes on 2024-03-12
MiXiBo
christinestraub
christinestraub Merge branch 'main' into mixibo/improved_pdf_html_links_support
83b51181
christinestraub feat: set consolidation-strategy for `link_start_indexes metadata` fi…
d8e8ac1d
ron-unstructured ron-unstructured enabled auto-merge 1 year ago
ron-unstructured Merge branch 'main' into improved_pdf_html_links_support
9aa5651b
christinestraub Merge branch 'main' into mixibo/improved_pdf_html_links_support
0359f3af
DanielRoeder1
christinestraub feat: remove leading extra tags when calculating link start index
c22e3f62
christinestraub Merge branch 'main' into mixibo/improved_pdf_html_links_support
a5d245ce
christinestraub Merge branch 'main' into mixibo/improved_pdf_html_links_support
5e052025
christinestraub chore: bump version
9483427a
christinestraub
MiXiBo
christinestraub
MiXiBo
MiXiBo reviewed start_index handling
fcee08af
disabled auto-merge 1 year ago
Head branch was pushed to by a user without write access
MiXiBo
DanielRoeder1
christinestraub
christinestraub
MiXiBo add support to corner-case with tags surrounded by href
de90ce3e
MiXiBo
christinestraub
MiXiBo
MiXiBo
christinestraub
MiXiBo
cragwolfe
cragwolfe approved these changes on 2024-04-05
christinestraub Merge branch 'main' into mixibo/improved_pdf_html_links_support
e8ff218b
christinestraub feat: set link text same as element text if start_index is -1
86a95f15
christinestraub test: refactor unit test
076492e4
christinestraub test: fix lint error
e300a70f
christinestraub refactor: fix missing code
391cac03
christinestraub feat: exclude tail text from link text when start_index is -1
9b065338
christinestraub feat: include links with urls but no text
8ff23af1
christinestraub update ingest test fixtures update ci
a11c968e
christinestraub Merge branch 'main' into feat/2625-html-support-link-start-index
5b9dee3f
christinestraub ci: revert ingest test fixtures update ci
d7b6afff
ryannikolaidis [DO NOT MERGE] Feat: add support for `start_index` in html `links` ex…
8895e5a4
christinestraub Merge branch 'main' into feat/2625-html-support-link-start-index
b4ec6765
christinestraub chore: update version
1765496b
christinestraub christinestraub force pushed from 3a3e4584 to 1765496b 1 year ago
christinestraub
christinestraub approved these changes on 2024-04-12
christinestraub christinestraub merged 0506aff7 into main 1 year ago
christinestraub
MiXiBo

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone