langchain
f0b6baa0 - fix(core): track within-batch deduplication in indexing num_skipped count (#32273)

Commit
269 days ago
fix(core): track within-batch deduplication in indexing num_skipped count (#32273) **Description:** Fixes incorrect `num_skipped` count in the LangChain indexing API. The current implementation only counts documents that already exist in RecordManager (cross-batch duplicates) but fails to count documents removed during within-batch deduplication via `_deduplicate_in_order()`. This PR adds tracking of the original batch size before deduplication and includes the difference in `num_skipped`, ensuring that `num_added + num_skipped` equals the total number of input documents. **Issue:** Fixes incorrect document count reporting in indexing statistics **Dependencies:** None Fixes #32272 --------- Co-authored-by: Alex Feel <afilippov@spotware.com>
Author
Parents
Loading