unstructured
94b3ffd0 - fix(chunking): preserve nested table structure in reconstruction (#4301)

Commit
5 days ago
fix(chunking): preserve nested table structure in reconstruction (#4301) ## Summary - Fix `_merge_table_chunks()` to merge only top-level rows from each chunk HTML table. - Prevent nested table rows from being hoisted into the reconstructed root table. - Add regression coverage to verify nested table structure is preserved. ## Finding Reference - https://github.com/Unstructured-IO/unstructured/pull/4291#discussion_r2978278640 ## Validation - `unset VIRTUAL_ENV && CI=false uv run --no-sync pytest -q test_unstructured/chunking/test_base.py -k "reconstruct_tables_from_a_mixed_element_list or preserves_nested_table_structure" --maxfail=1` - `unset VIRTUAL_ENV && CI=false uv run --no-sync pytest -q test_unstructured/chunking/test_base.py test_unstructured/chunking/test_dispatch.py --maxfail=1` - `unset VIRTUAL_ENV && uv run --no-sync python - <<'PY' from unstructured.partition.text import partition_text elements = partition_text(text="Codex initializer smoke test") assert elements, "partition_text returned no elements" print(f"partition_text smoke check passed ({len(elements)} elements)") PY` - `unset VIRTUAL_ENV && CI=false uv run --no-sync pytest -q test_unstructured/partition/test_text.py --maxfail=1` authored by codex
Author
Parents
Loading