unstructured
Token-Based Chunking Support
#4203
Merged

Token-Based Chunking Support #4203

eureka928
eureka928 eureka928 force pushed from 40a820c2 to c0ab0fed 62 days ago
eureka928
badGarnet
claude
badGarnet
badGarnet commented on 2026-01-22
badGarnet
badGarnet commented on 2026-01-22
badGarnet
badGarnet commented on 2026-01-22
badGarnet
badGarnet commented on 2026-01-22
eureka928
badGarnet
socket-security
eureka928 feat: add tiktoken dependency for token-based chunking
725bd1ae
eureka928 feat: add token-based chunking support to ChunkingOptions
1a02540e
eureka928 feat: add token-based parameters to chunk_by_title()
91eb0482
eureka928 feat: add token-based parameters to chunk_elements()
690236ba
eureka928 test: add tests for token-based chunking
e0d0e14c
eureka928 docs: update CHANGELOG for token-based chunking feature
75a1281e
eureka928 fix: remove duplicate TokenCounter imports in tests
333dad79
eureka928 style: apply black formatting
bde6a052
eureka928 fix: use token-based overlap in token chunking mode
6da7ca9c
eureka928 eureka928 force pushed from 7e161ec6 to 6da7ca9c 61 days ago
eureka928 fix: align regex version in extra-chunking-tokens.txt with base const…
1c89138f
eureka928
eureka928 eureka928 requested a review from badGarnet badGarnet 61 days ago
badGarnet
badGarnet commented on 2026-01-22
eureka928 test: use exact text assertions in token-based chunking tests
7ec3505b
badGarnet
badGarnet commented on 2026-01-22
eureka928 chore: bump version to 0.18.31-dev1
91f6a24a
badGarnet
badGarnet approved these changes on 2026-01-22
badGarnet badGarnet merged 01c3f7c2 into main 61 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone