Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
Unstructured-IO/unstructured
Pull Requests
Commits
Open
Closed
Fix ARM64 paddlepaddle image builder bug
#4228 opened 2026-02-10 02:35 by
PastelStorm
feat: add XLSM (Excel Macro-Enabled Workbook) parsing support
#4227 opened 2026-02-08 16:51 by
longway-code
Add AgentMarket - B2A Marketplace
#4225 opened 2026-02-03 17:14 by
stromfee
docs: fix redundant whitespace in pyenv command in README
#4224 opened 2026-02-03 13:38 by
longway-code
fix(deps): Update docker.elastic.co/elasticsearch/elasticsearch Docker tag to v8.19.11
dependencies
security
#4223 opened 2026-02-03 12:19 by
utic-renovate[bot]
feat: Infer hierarchical heading levels (H1-H4) for PDFs
#4222 opened 2026-02-02 22:28 by
Angel98518
⚡️ Speed up function `process_data_with_ocr` by 1,726% in PR #4217 (`fix/make-dpi-consistent`)
⚡️ codeflash
🎯 Quality: High
#4219 opened 2026-01-30 02:04 by
codeflash-ai[bot]
fix: remove duplicate characters caused by fake bold rendering in PDFs
#4215 opened 2026-01-28 12:23 by
bittoby
Fix FutureWarning: Add test to verify bytes are wrapped in BytesIO for read_excel
#4213 opened 2026-01-27 12:59 by
Angel98518
⚡️ Speed up function `merge_out_layout_with_ocr_layout` by 30%
#4212 opened 2026-01-27 02:31 by
aseembits93
fix(deps): Update semitechnologies/weaviate Docker tag to v1.35.7
dependencies
security
#4210 opened 2026-01-26 18:12 by
utic-renovate[bot]
⚡️ Speed up function `standardize_quotes` by 144%
#4201 opened 2026-01-21 02:31 by
KRRT7
feat: chunking by character and title now isolates tables
#4197 opened 2026-01-15 19:26 by
badGarnet
fix: NameError: LayoutElements not defined in paddle_ocr.py
#4195 opened 2026-01-15 16:18 by
mohansinghi
Eliminate cleaners/core import time bottleneck
#4167 opened 2026-01-07 03:44 by
aseembits93
update README.md
#4121 opened 2025-11-12 10:57 by
vhsakpal
new file: .idx/mcp.json
#4111 opened 2025-11-05 02:21 by
romethefixer
Bug 4105
#4107 opened 2025-10-13 20:35 by
carminoplata
fix: None text attribute when normalizing Picture to Image element
#4083 opened 2025-08-22 15:25 by
ishahroz
Switch from pdfminer to paves to improve robustness and use multiple CPUs
#4067 opened 2025-07-19 04:10 by
dhdaines
perf: add early page count check to prevent expensive PDFMiner proces…
#4048 opened 2025-07-08 20:09 by
CyMule
Feature/remove unnessary re for table ele in pdf
#3984 opened 2025-04-09 11:24 by
JIAQIA
bugfix/fix missing extensions in file detection
#3926 opened 2025-02-18 17:24 by
rbiseck3
Improve readability of the text by adding new line to the end of row
#3913 opened 2025-02-07 14:56 by
Sheripov
fix: preserve text after line breaks in PowerPoint table cells
#3877 opened 2025-01-18 04:07 by
yamazombie
Add password
#3876 opened 2025-01-18 00:26 by
Coniferish
add post chunking strategy
#3869 opened 2025-01-16 17:45 by
tbs17
feat: Allow deactivating OCR entirely with hi_res strategy
#3839 opened 2024-12-17 19:58 by
dhdaines
fix: Fix issue #3815
#3835 opened 2024-12-17 09:30 by
PhorstenkampFuzzy
fix: when convert doc to docx, UnicodeDecodeError may be raised
#3830 opened 2024-12-14 09:10 by
YooshiJay
Older