Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
Unstructured-IO/unstructured
Pull Requests
Commits
build/add-python-3.11-support-back
0.16.18-release
3051/image-to-py312
3105/office-image-fix
CI-3347
CLONE-CI-PR3108
CLONE-PR3108
CORE-1503-dont-use-partition
CORE-1558-Integrate-2-column-format-ordering-logic-into-unstructured
CORE-3587/better-element-ids|ingest-test-fixtures-update-b5e53bb
CORE-5030/gpt4o_ocr_adam_mix_openai_tess
CORE-5030/gpt4o_ocr_adam_v2
CORE-5030/gpt4o_ocr_adam
CORE-5030/gpt4o_ocr_individual_blocks
CORE-5030/tesseract_without_extractable_benchmark
CORe-1746/audio-partition-brick
ML-208/ML-236-evaluate-models
ML-593/quote-standardization
ML-1128/fix-element-ids
P6M-615-add-voyageai-embed-to-v2
acameron/update-readme
add-tesseract-confidence-threshold
add-time-regression
ahmet/qdrant-normalization
ahmet/split-dev-changelogs
ahmet/trials
ahmet/update
alan/document-level-sorting
audio-video
benjamin/feat/clean-by-store-pdfminer-inner-elements
benjamin/feat/clean-pdfminer-inner-elements|ingest-test-fixtures-update-149514f
blocking-async-fixes
blore/fix/add-lang-param-to-google-ocr
bug/division-by-zero-pdf-partition
build/add-python-3.11-support-back
build(deps)/update-deps
build/deps-2024-07-29
build/use-python-cache
ci/ingest-test-fixtures/download-nltk-models|ingest-test-fixtures-update-e2b1a4f
codeflash/optimize-CustomPDFPageInterpreter._patch_current_chars_with_render_mode-mm3h21a8
codeflash/optimize-pr4098-2025-09-23T20.06.24
crag/arm64-friendly-reqs
crag/fixtures-update|ingest-test-fixtures-update-9f57456
crag/image-test-connectivity
crag/linear-CI
crag/pdf-boom-issue|ingest-test-fixtures-update-7054e5d
crag/test-amd64-build-only
crag/test-docker-build
data-source-props
deps/security-bump
docs-updates
dummy-for-ingest-test-update|ingest-test-fixtures-update-182a095
dummy-for-ingest-test-update
early-page-check
exp/investigate-different-output-hi_res
feat/add-form-element-type
feat/bboxes-ordering
feat/databricks-volumes-src
feat/date-in-metadata|ingest-test-fixtures-update-94037c8
feat/gpt4o-ocr
feat/html-para-split
feat/load-into-spacy-notebook
feat/markdown-to-table-cells
feat/partition-fb2
feat/patch-pdfminer-to-expose-rendermode-for-ltchar
feat/pdf-remove-cid-check|ingest-test-fixtures-update-5fb5b4c
feat/pdf-remove-cid-check|ingest-test-fixtures-update-6750053
feat/replace-pdfminer-with-pdfplumber
feat/use-extracted-for-tables
feat/use-pdfium-for-extracted-layout
feat/2208-improve-reading-order|ingest-test-fixtures-update-932ad02
feature/weighted-metrics
fix/chunking-should-not-group-table-with-other-elements
fix/docker-publish
fix/docx-include-text-from-shapes
fix/invalid-evaluation-doctype-deduction
fix/nltk-download-order
fix/no-mesa-workaround
fix_password
fix/pdfminer-duplicated-text
fix/sftp-connection-fix
fix/unstructured-client-compat-fix
fix/update-chipper-ex-nb
fix/1057-etree-parser-error-xlsx|ingest-test-fixtures-update-1efdfcf
fix-render-mode-figure-revert
fix-table-metrics
fix-when-first-element-doesnt-have-parent-id
gh-pages
gh-readonly-queue/main/pr-1020-331c7faf38984d0ce29d920f6ac51d5071f6c0c5
gh-readonly-queue/main/pr-2079-f1ad901f5725f1c05a73203f47ea47878ba163af
hubert_upstream_unstructured_temp1
hubert_upstream_unstructured_temp2
include-upstream-unstructure-latest
ingest-test-fest
installing-codecov
jack/embed_table_reference
jiajun/ingest
jj/OSS-23/msg-and-email-metadata|ingest-test-fixtures-update-05fd4e6
jj/zh_adaptation|ingest-test-fixtures-update-f0a6755
jj/zh_adaptation
jj/1227-bbox-nan|ingest-test-fixtures-update-63f619c
jj/1520-rfctr-text|ingest-test-fixtures-update-c8541a2
jkm/reorient_pages
klaijan/add-test-coverage-metrics
klaijan/ci-cct-running-env
klaijan/get-eval-metrics-ingest-in|ingest-test-fixtures-update-b3b6b79
lang-detection
languages-param-3
luke/may-bump-python-reqs
main|ingest-test-fixtures-update-2bb463d
main|ingest-test-fixtures-update-b283962
main
marek/fix/text_as_html-metadata
marek/prefer_languages
ms-dsproperties
newelh/metadata-refactor
nick/uploadv4
nina/add_openai_embed_parameters
od_table_extraction
old_version_repo
onedrive-additional-properties
partition-inline-ocr_only
pdf-plumber
pdftext-hires-investigation
pdftext-metrics-check
pluto/langchain-text-splitter-poc
potter/all-doc-types
potter/astradb-updates
potter/connectors-jupyter-notebook
potter/improve-v2-connector-docs
potter/test-codeflash
prevent-ValueError-with-chipper-extracted-elements
refactor/nltk-download
refactor-click-wrappers
refactor-click-wrappers-2|ingest-test-fixtures-update-b3cab42
renovate/docker.elastic.co-elasticsearch-elasticsearch-8.x
renovate/security-updates
renovate/semitechnologies-weaviate-1.x
rfctr/base-partitioner-class
robinson/winter-sports-example
roman/bugfix-missing-extension
rvztz/cleanup-source-metadata
rvztz/unit-test-conform-dict
ryan/bump-box-expected-files
ryan/ci-version-bump-only
ryan/cognitive-search-demo
ryan/cognitive-search-demo-2
ryan/fix-secrets-passed-to-ingest-tests|ingest-test-fixtures-update-c5e9ddc
ryan/fix-test-api-failure
ryan/ignore-notion-overwrite-fixtures
ryan/improved-salesforce-partitioning|ingest-test-fixtures-update-013fbc4
ryan/improved-salesforce-partitioning
ryan/ingest-add-missing-chunk-params|ingest-test-fixtures-update-2724c82
ryan/ingest-add-missing-chunk-params
ryan/ingest-end-to-end-demo
ryan/investigate-hang
ryan/investigate-hang-rollback-ingest
ryan/less-connector-steps
ryan/more-less-connector-steps
ryan/refactor-ingest-tests|ingest-test-fixtures-update-c175ae5
ryan/roll-back-inference-0.5.27
ryan/session-by-config
ryan/session-per-config
ryan/skip-tests-on-changelog-or-version
ryan/test-ci-failures
ryan/test-empty-commit
ryan/test-es-issue
ryan/test-new-creds
ryan/test-single-file-ingest|ingest-test-fixtures-update-58e988e
ryan/test-single-file-ingest
ryan/test-update-fixtures|ingest-test-fixtures-update-f2060ec
ryan/test-update-fixtures
ryan/validate-new-gh-pr-token-key|ingest-test-fixtures-update-cadf764
ryan/validate-new-gh-pr-token-key
ryan/validate-new-gh-pr-token-key-01|ingest-test-fixtures-update-023071b
ryan/validate-new-gh-pr-token-key-01
scanny/spike-relax-tbl-segregation-chunking
sdfasdfasdf
sebastian/draw_bboxes
sms7234-patch-1
temp-metrics-check
tesseract-second-version
test/segregate-long-running-integration-tests
test/speed-up-chipper-schema-test
test_vertical_pred
trevor/alias-python
trevor/az-login
trevor/azure-cli-test
trevor/azure-gcp-auth-workflow
trevor/base-image-refresh-9
trevor/cancel-concurrent-workflows
trevor/detect-arch-paddle
trevor/gcp-login
trevor/image-load-issue-amd64
trevor/large-runners
trevor/python3.12
trevor/scarf-dep-fix
trevor/scarf-pip-local-fix
trevor/scarf-python-dep
tshen/add-post-chunking-strategy
update_ruff_invocation
update-table-html-extraction
v0.13.5-dev0
yao/add-table-experiment-script
yao/bump-inference-to-use-config
yao/core-1741-use-image-to-data
yao/duplicate-tsv-csv-to-run-ingest-update-and-tests
yao/skip-failing-delta-lake-tests
yuming/fix_install_tesseract_ci
yuming/get_ingest_output_paddle_onnx_runtime
yuming/get_ingest_output_paddle
yuming/get_ingest_output_tesseract
yuming/nex-28-chunker-error-nonetype-object-has-no-attribute
yuming/python3.11_everywhere
Add 3.11 to ci test matric
qued
committed
34 days ago
ccb01925
update lock
qued
committed
34 days ago
47e1eb67
create split deps for pyproject
qued
committed
34 days ago
0ede74bd
Migrate to uv (#4226)
Emily Voss
committed
35 days ago
Verified
69770c65
feat: increase PIL's max image pixel value for pdf partition (#4220)
badGarnet
committed
43 days ago
Verified
3e426fd1
chore: bump dependencies for 0.18.34 (#4221)
luke-kucing
committed
43 days ago
Verified
fef9959a
Fix: make pdf image dpi consistent (#4217)
badGarnet
committed
46 days ago
Verified
2a8f7c6a
Preserve newlines in Table and TableChunk elements during PDF partitioning (#4214)
eureka928
committed
46 days ago
Verified
b1e4b009
feat: add group_elements_by_parent_id utility function (#4207)
MkDev11
committed
48 days ago
Verified
e131439d
feat: put pdfium call behind a threadlock (#4211)
badGarnet
committed
49 days ago
Verified
4bbb1fff
chorse sep bump to resolve open CVEs (#4205)
luke-kucing
committed
49 days ago
Verified
d1f1bdf1
fix: Preserve Line Breaks in Code Blocks During Chunking (#4196)
eureka928
committed
50 days ago
Verified
d4caedf0
fix(deps): Update semitechnologies/weaviate Docker tag to v1.35.3 (#4135)
utic-renovate[bot]
committed
51 days ago
Verified
8f32550d
fix(deps): Update opensearchproject/opensearch Docker tag to v2.19.4 (#4134)
utic-renovate[bot]
committed
51 days ago
Verified
dbe96e22
fix(deps): Update docker.elastic.co/elasticsearch/elasticsearch Docker tag to v8.19.10 (#4133)
utic-renovate[bot]
committed
51 days ago
Verified
7b366c53
fix: filter coordinates kwargs to prevent TypeError in hi_res PDF processing (#4206)
MkDev11
committed
52 days ago
Verified
f0b0e7c9
Token-Based Chunking Support (#4203)
eureka928
committed
53 days ago
Verified
01c3f7c2
fix: remove sandbox=True from pypandoc to fix ODT conversion (#4193)
MkDev11
committed
54 days ago
Verified
c0323a61
fix(deps): switch from pip-compile to uv pip compile (#4202)
lawrence-u10d
committed
54 days ago
Verified
95fea7e5
fix: reduce default dpi to 350 (#4199)
qued
committed
55 days ago
Verified
8cb62786
Luke/update dockerfile (#4192)
luke-kucing
committed
55 days ago
Verified
b70758ae
enhancement: Speed up method `_DocxPartitioner._style_based_element_type` by 593% (#4179)
aseembits93
committed
59 days ago
Verified
a55810de
enhancement: Speed up function `_get_optimal_value_for_bbox` by 2,883% (#4181)
aseembits93
committed
60 days ago
Verified
6abc5dfa
fix: hange default for languages parameter from ["auto"] to None (#4194)
eureka928
committed
60 days ago
Verified
78aa476d
fix: address jaraco CVE (#4198)
qued
committed
60 days ago
Verified
42dc933e
feat: consider rotated text as low fidelityfeat: consider rotated text (#4190)
badGarnet
committed
61 days ago
Verified
d64c57d3
enhancement: render pdfs with pdfium (#4185)
qued
committed
61 days ago
Verified
138661a7
fix: fix version number (#4189)
badGarnet
committed
63 days ago
Verified
152a41da
fix: add EN DASH to UNICODE_BULLETS for clean_bullets (#4186)
MkDev11
committed
63 days ago
Verified
8f683e5c
Feat: patch pdfminer and use rendermode to detect invisible text (#4158)
badGarnet
committed
64 days ago
Verified
2b44f7df
Older