llama.cpp
95930da3 - convert-hf : get bit-exact same output as ./quantize

Commit

1 year ago

convert-hf : get bit-exact same output as ./quantize The quantization version was missing. * convert-hf : don't round bf16 NANs * convert-hf : save some memory with np.int16 intermediate bf16 weights * convert-hf : more closely match llama.cpp with which weights to keep in f32

References

#7158 - convert-hf : support bfloat16 conversion

Author

compilade

Committer

compilade

Parents

3801db12

llama.cpp 95930da3 - convert-hf : get bit-exact same output as ./quantize

llama.cpp
95930da3 - convert-hf : get bit-exact same output as ./quantize