Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
Unstructured-IO/unstructured
Pull Requests
Commits
Open
Closed
fix: derive crop box from coordinate extent in `save_elements`
#4371 opened 2026-06-09 23:54 by
badGarnet
feat: Implement PDF heading hierarchy inference for category_depth
#4369 opened 2026-06-09 20:56 by
ylcnymn
fix: drop processing instructions in HTML parser
#4361 opened 2026-06-05 11:50 by
assinscreedFC
feat: derive category_depth from heading level in the v2 (ontology) HTML parser (ML-1328)
#4360 opened 2026-06-04 20:30 by
qued
Optimize XLSX subtable detection memory usage
#4357 opened 2026-06-01 15:17 by
CyMule
feat: expose layout confidence metadata
#4356 opened 2026-05-26 18:32 by
RitwijParmar
Fix br tail text handling in HTML tables
#4351 opened 2026-05-12 18:31 by
dsolankii
fix: Support text partitioning from ZipExtFile objects
#4350 opened 2026-05-11 21:06 by
dsolankii
fix: avoid false-positive Title classification for long no-space text
#4348 opened 2026-04-28 20:44 by
claytonlin1110
fix: prefer embedded PDF text over OCR for hi_res table tokens
#4347 opened 2026-04-28 20:26 by
claytonlin1110
fix(html): enable huge_tree on HTMLParser so deeply nested HTML partitions
#4340 opened 2026-04-16 22:57 by
CrepuscularIRIS
feat: add clean_newline utility for hyphenated line breaks (#2513)
#4339 opened 2026-04-16 12:38 by
DevAbdullah90
fix: convert Tesseract language codes for PaddleOCR in OCRAgent.get_agent()
#4329 opened 2026-04-09 10:45 by
Mustafa-Shoukat1
feat: add AG2 multi-agent document processing example
#4326 opened 2026-04-07 18:49 by
faridun-ag2
feat: infer hierarchical heading levels (H1-H6) for PDFs (#4204)
#4325 opened 2026-04-07 17:52 by
statxc
feat: add support python3.14
#4312 opened 2026-04-01 10:05 by
FomalhautWeisszwerg
refactor: don't import unstructured-inference via partition.pdf
#4284 opened 2026-03-16 13:48 by
artdent
fix: improve multi-column layout sorting for academic papers (#4104)
#4283 opened 2026-03-16 00:07 by
Gopesh111
refactor: replace deprecated decorators in partition_image with apply_metadata
#4271 opened 2026-03-02 12:55 by
HemantSudarshan
fix: add 'el' and 'gr' as Greek language code aliases for Tesseract OCR
#4270 opened 2026-02-27 18:45 by
s0wa48
fix: handle list output from group_bullet_paragraph in element apply()
#4253 opened 2026-02-21 20:04 by
s0wa48
Simple typo fix
#4251 opened 2026-02-20 08:06 by
rchen19
Feat: embedding model voyage 4 family
#4234 opened 2026-02-11 18:12 by
fzowl
feat: add XLSM (Excel Macro-Enabled Workbook) parsing support
#4227 opened 2026-02-08 16:51 by
longway-code
Add AgentMarket - B2A Marketplace
#4225 opened 2026-02-03 17:14 by
stromfee
docs: fix redundant whitespace in pyenv command in README
#4224 opened 2026-02-03 13:38 by
longway-code
Fix FutureWarning: Add test to verify bytes are wrapped in BytesIO for read_excel
#4213 opened 2026-01-27 12:59 by
Achieve3318
⚡️ Speed up function `merge_out_layout_with_ocr_layout` by 30%
#4212 opened 2026-01-27 02:31 by
aseembits93
feat: chunking by character and title now isolates tables
#4197 opened 2026-01-15 19:26 by
badGarnet
fix: NameError: LayoutElements not defined in paddle_ocr.py
#4195 opened 2026-01-15 16:18 by
mohansinghi
Older