unstructured
Add parsing HTML to unstructured elements
#3732
Merged

Add parsing HTML to unstructured elements #3732

plutasnyy merged 15 commits into main from parsing-html-to-elements
plutasnyy
plutasnyy Add parsing HTML to unstructured elements
e070ee56
plutasnyy plutasnyy assigned plutasnyy plutasnyy 1 year ago
plutasnyy Add page number and category depth
ae08a433
plutasnyy Fix BR tags, wrong span names and additional empty tags
27142a6a
plutasnyy Adjust test for new parser settings
74b0933f
plutasnyy pip compile
04534b4a
cragwolfe
cragwolfe commented on 2024-10-22
cragwolfe
cragwolfe commented on 2024-10-22
cragwolfe
cragwolfe commented on 2024-10-22
cragwolfe
cragwolfe commented on 2024-10-22
cragwolfe
cragwolfe commented on 2024-10-22
cragwolfe
cragwolfe commented on 2024-10-22
cragwolfe
cragwolfe commented on 2024-10-22
cragwolfe
cragwolfe commented on 2024-10-22
cragwolfe
cragwolfe commented on 2024-10-22
plutasnyy Do not add Document element
97269691
plutasnyy Remove dashes from ids
adb99600
plutasnyy Add support for filename
64780526
plutasnyy Add docstring
df2fedd0
plutasnyy Add partition_html
fdf176a3
cragwolfe
cragwolfe commented on 2024-10-22
plutasnyy Merge remote-tracking branch 'origin/main' into parsing-html-to-elements
dd68b9cc
plutasnyy Update requirements
016e851a
cragwolfe
cragwolfe commented on 2024-10-23
plutasnyy Fix unit tests for 3.9 and rename param
26c8714c
plutasnyy Fix if statement
6168a5f5
plutasnyy Fix paths in tests
4bc246db
mariannaparzych
mariannaparzych approved these changes on 2024-10-23
plutasnyy plutasnyy merged 03a3ed8d into main 1 year ago
plutasnyy plutasnyy deleted the parsing-html-to-elements branch 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
Labels
Milestone