unstructured
add support for `start_index` in `html` links extraction
#2600
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
41
Changes
View On
GitHub
add support for `start_index` in `html` links extraction
#2600
christinestraub
merged 41 commits into
Unstructured-IO:main
from
MiXiBo:improved_pdf_html_links_support
Refactor threshold to annotation_threshold and make it an optional pa…
724cdb52
Merge branch 'main' into main
643e67ea
Merge branch 'main' into main
34c78de8
Merge branch 'Unstructured-IO:main' into main
6c34d9fa
add support for start_index to html link extraction
f4a18b54
Merge branch 'Unstructured-IO:main' into improved_pdf_html_links_support
ff2f3bfb
MiXiBo
changed the title
Improved pdf html links support
add support for `start_index` in `html` links extraction
1 year ago
Revert "Refactor threshold to annotation_threshold and make it an opt…
4f99d67b
Merge branch 'main' into improved_pdf_html_links_support
0c1e2c14
Merge branch 'main' into improved_pdf_html_links_support
8c12ca71
Merge branch 'main' into mixibo/improved_pdf_html_links_support
99f3545e
chore: update changelog & version
e75b43d1
feat: fix TypeError: object of type 'NoneType' has no len()
3be94525
test: add unit test
9c13c984
refactor: rename link_start_indexs -> link_start_indexes
70f2a879
christinestraub
requested a review
from
Klaijan
1 year ago
christinestraub
requested a review
from
MthwRobinson
1 year ago
christinestraub
requested a review
from
cragwolfe
1 year ago
[DO NOT MERGE] feat: add support for start_index in html links extrac…
87379817
test: add unit test to test partition_html() with links
f7623e1a
test: fix lint error
62f15a29
Merge branch 'main' into mixibo/improved_pdf_html_links_support
2079213d
MthwRobinson
approved these changes on 2024-03-12
Merge branch 'main' into mixibo/improved_pdf_html_links_support
83b51181
feat: set consolidation-strategy for `link_start_indexes metadata` fi…
d8e8ac1d
ron-unstructured
enabled auto-merge
1 year ago
Merge branch 'main' into improved_pdf_html_links_support
9aa5651b
Merge branch 'main' into mixibo/improved_pdf_html_links_support
0359f3af
feat: remove leading extra tags when calculating link start index
c22e3f62
Merge branch 'main' into mixibo/improved_pdf_html_links_support
a5d245ce
Merge branch 'main' into mixibo/improved_pdf_html_links_support
5e052025
chore: bump version
9483427a
reviewed start_index handling
fcee08af
disabled auto-merge
1 year ago
Head branch was pushed to by a user without write access
add support to corner-case with tags surrounded by href
de90ce3e
cragwolfe
approved these changes on 2024-04-05
Merge branch 'main' into mixibo/improved_pdf_html_links_support
e8ff218b
feat: set link text same as element text if start_index is -1
86a95f15
test: refactor unit test
076492e4
test: fix lint error
e300a70f
refactor: fix missing code
391cac03
feat: exclude tail text from link text when start_index is -1
9b065338
feat: include links with urls but no text
8ff23af1
update ingest test fixtures update ci
a11c968e
Merge branch 'main' into feat/2625-html-support-link-start-index
5b9dee3f
ci: revert ingest test fixtures update ci
d7b6afff
[DO NOT MERGE] Feat: add support for `start_index` in html `links` ex…
8895e5a4
Merge branch 'main' into feat/2625-html-support-link-start-index
b4ec6765
chore: update version
1765496b
christinestraub
force pushed
from
3a3e4584
to
1765496b
1 year ago
christinestraub
approved these changes on 2024-04-12
christinestraub
merged
0506aff7
into main
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
cragwolfe
christinestraub
MthwRobinson
Klaijan
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub