feat: pass skip_sha256=True to hf_xet for bucket uploads (#3900)
* feat: pass skip_sha256=True to hf_xet for bucket uploads
Bucket uploads don't need SHA-256 in the shard metadata (the sha_index
GSI is only used for LFS pointer resolution, which doesn't apply to
buckets). Pass skip_sha256=True to hf_xet.upload_files() and
upload_bytes() in the bucket upload path to skip the SHA-256
computation, removing the main CPU bottleneck on non-SHA-NI instances.
Depends on: huggingface/xet-core#679
Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
* test: use real bucket upload instead of mocks for skip_sha256 test
Replace the two mock-based tests with a single integration test that:
- Creates a real Bucket on staging Hub
- Uploads files from both filepath and bytes in a single batch
- Wraps (not mocks) hf_xet.upload_files and hf_xet.upload_bytes to
verify skip_sha256=True is passed
- Verifies files are actually uploaded by listing the bucket tree
Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
* test: skip skip_sha256 test when hf_xet doesn't support it yet
The test wraps the real hf_xet functions, so it fails when the
installed hf_xet predates the skip_sha256 parameter (xet-core#679).
Use inspect.signature to detect support and pytest.skip accordingly.
Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
* test: handle built-in functions in skip_sha256 signature check
hf_xet.upload_files is a compiled built-in function, so
inspect.signature() raises ValueError. Catch it and skip the test
when the signature can't be introspected (older hf_xet).
Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
* fix: gracefully fall back when hf_xet lacks skip_sha256 support
Use try/except TypeError around upload_files/upload_bytes calls with
skip_sha256=True, falling back to calls without it for older hf_xet
versions. TypeError for unknown kwargs on compiled functions is raised
before any I/O, so the fallback is safe.
Update test to check call_args_list[0] (the first attempt always
includes skip_sha256=True) instead of requiring the function to
accept it.
Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
* better like this
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Lucain <Wauplin@users.noreply.github.com>