Turbopack: switch chunk/asset hashes from hex to base40 encoding (#91137)
### What?
Switch Turbopack's hash encoding for chunk and asset output filenames
from hexadecimal (base16) to base40, using the alphabet \`0-9 a-z _ - ~
.\`. Version hashes (used for HMR update comparison, not filenames) use
base64 instead.
### Why?
Base40 encodes the same number of bits in fewer characters than hex,
producing shorter output filenames. All 40 characters are RFC 3986
unreserved (URL-safe) and safe on case-insensitive filesystems (macOS
HFS+/APFS, Windows NTFS).
Hash truncation lengths are reduced proportionally to maintain
equivalent collision resistance:
| Context | Before (hex) | After (base40) | Entropy |
|---|---|---|---|
| Content hash in chunk filenames | 16 chars | 13 chars | ~69 bits |
| Content hash in asset filenames | 8 chars | 13 chars | ~69 bits |
| Ident disambiguator hash | 8 chars | 7 chars | ~37 bits |
| Long-path prefix hash | 5 chars | 4 chars | ~21 bits |
### How?
**New encoding module** (\`turbo-tasks-hash/src/base40.rs\`):
- Defines the base40 alphabet and length constants (\`BASE40_LEN_64 =
13\`, \`BASE40_LEN_128 = 25\`)
- Implements a generic \`encode_base40_fixed<N>\` helper to avoid
duplication
- Public API: \`encode_base40(u64) -> String\` and
\`encode_base40_128(u128) -> String\`
**New base64 encoding** (\`turbo-tasks-hash/src/base64.rs\`):
- \`encode_base64(u64) -> String\` — 11-char base64 (no padding) for
version hashes
- Version hashes don't appear in URLs or filenames, so base64 is safe
and shorter
**New \`HashAlgorithm\` variants** (\`turbo-tasks-hash/src/lib.rs\`):
- \`Xxh3Hash64Base40\` and \`Xxh3Hash128Base40\` added alongside
existing hex variants
- Existing hex variants kept for internal manifests and identifiers
**\`ContentHashing\` moved to \`turbopack-core\`**:
- Moved from \`turbopack-browser\` to
\`turbopack-core/src/chunk/mod.rs\` so both \`BrowserChunkingContext\`
and \`NodeJsChunkingContext\` can use it
**Separate chunk vs asset content hashing**:
- \`BrowserChunkingContext\`: \`content_hashing\` renamed to
\`chunk_content_hashing\` (optional), new \`asset_content_hashing:
ContentHashing\` field (non-optional, defaults to 13 chars)
- \`NodeJsChunkingContext\`: new \`asset_content_hashing:
ContentHashing\` field (non-optional, defaults to 13 chars)
- Builder methods: \`use_content_hashing()\` renamed to
\`chunk_content_hashing()\`, new \`asset_content_hashing()\`
**Version hashes switched to base64**:
- \`turbopack-nodejs/src/ecmascript/node/version.rs\`
- \`turbopack-dev-server/src/html.rs\`
- \`turbopack-browser/src/ecmascript/version.rs\`,
\`merged/version.rs\`, \`list/version.rs\`
**Other callers updated** (15 files across turbopack and next-core):
- All chunk/asset content hashing switched from \`Xxh3Hash128Hex\` →
\`Xxh3Hash128Base40\`
- \`ContentHashing::Direct { length }\` reduced from 16 → 13
- Asset path truncations use full 13-char base40 hash (matching chunk
filenames)
**Exception — \`wasm_edge_var_name\`** (\`turbopack-wasm/src/lib.rs\`):
- Kept as \`Xxh3Hash128Hex\` because the hash is used as part of a
JavaScript variable name (\`wasm_{hash}\`), and base40 characters \`-\`,
\`~\`, \`.\` are not valid JS identifier characters.
**Scope — NOT changed:**
- Webpack configuration (unchanged)
- Internal manifests (\`routes_hashes_manifest\`,
\`project_asset_hashes_manifest\`)
- Internal identifiers (font naming, external module hashing, data URI
sources, debug IDs)
- SRI hashes (SHA-based Base64, different purpose)
---------
Co-authored-by: Vercel <vercel[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>