llvm-project
71d78b22 - [Hashing] Replace CityHash mixers with xxh3 (#194567)

Commit
26 days ago
[Hashing] Replace CityHash mixers with xxh3 (#194567) Replace the CityHash-style mixer in hash_combine and (transitively) hash_value(std::basic_string), hash_value(StringRef), and therefore DenseMap<StringRef, X> lookups, with a flatten-and-call into xxh3_64bits, a modern hash superior to CityHash. hash_value(int) / hash_value(ptr) keep the existing Murmur-style hash_16_bytes mixer; those are the dominant DenseMap key paths and a fully-inline 16-byte mix beats inlining xxh3's larger 0..16-byte short path. To break dependency cycle: xxHash64, xxh3_64bits, and xxh3_128bits ArrayRef/StringRef overloads move from llvm/Support/xxhash.h to inline overloads in llvm/ADT/ArrayRef.h and llvm/ADT/StringRef.h, so xxhash.h has no ADT dependencies. A variant that inlined xxh3's 0..16-byte fast path at every combine_bytes call site (vs. always calling out-of-line xxh3_64bits) showed no measurable compile-time improvement on the tracker, so combine_bytes is a one-liner over the out-of-line entry point. llvm-compile-time-tracker.com (CTMark, instructions:u) ``` stage1-O0-g -1.76% (sqlite3 -3.78%) stage1-aarch64-O0-g -1.40% (sqlite3 -2.86%) stage1-ReleaseLTO-g -1.13% stage1-ReleaseThinLTO -0.45% stage1-O3 -0.43% stage1-aarch64-O3 -0.42% stage2-O0-g -0.42% stage2-O3 -0.15% clang build -0.71% (wall -0.42%) ``` DenseMap-of-pointer paths (dominant at -O3) are untouched, so higher- optimization configs see smaller wins as expected. opt's .text shrinks ~92 KB. Subsumes the StringRef-only carve-out proposed in #191115. Notes on properties not introduced by this patch: - Endianness: hash_combine over native integers was already not cross-host stable. memcpy of a native integer into the buffer is host-encoded; fetch32 normalized the read but not the underlying bytes, so on LE vs BE the value fed to the mixer already differed. xxh3 inherits the same property: same byte stream, different mixer. - Process seed: combine_bytes XORs get_execution_seed into the result, which cancels under hash_combine(x) ^ hash_combine(y). The pre-patch short/state paths fed the seed through hash_16_bytes / shift_mix non-linearly, so this is a regression in seed effectiveness under that pattern. Default seed is constant, so this only matters under LLVM_ENABLE_ABI_BREAKING_CHECKS. Follow-up: add a seeded xxh3 entry point in libSupport. Aided by Claude opus 4.7
Author
Parents
Loading