DeepSpeed
dc0fd295 - Rename dequantization template parameters (#7976)

Commit
26 days ago
Rename dequantization template parameters (#7976) Followup to https://github.com/deepspeedai/DeepSpeed/pull/7973 for #7971 The naming of q_mantisa_bits and mantisa_bits was swapped. The invocation set: ``` q_mantisa_bits = mantisa _mantisa_bits = CONST_Q_MANTISA_BITS _exponent_bits = CONST_Q_EXPONENT_BITS ``` so correct them by swapping the names back. I noticed that the code needs a thorough review because multiple places look suspicious: ``` // Why the default args? They seem to not even be matching (16 != 3+4+1) int total_q_bits = 16, int q_mantisa_bits = 3, int q_exponent_bits = 4> // Why recompute if there is a total_q_bits template? constexpr int quantized_bits = q_mantisa_bits + q_exponent_bits + 1; // Likely wrong: total_q_bits < mantisa_bits --> negative bits? Likely caused by wrong naming constexpr int q_exponent_bits = total_q_bits - mantisa_bits - 1; // should likey use a `q_` prefix not `_` constexpr uint16_t _mantisa_mask = (1 << q_mantisa_bits) - 1; constexpr uint16_t _exponent_mask = ((1 << q_exponent_bits) - 1) << q_mantisa_bits; constexpr uint16_t _sign_mask = 1U << (q_mantisa_bits + q_exponent_bits); ``` cc @Cursx Signed-off-by: Alexander Grund <alexander.grund@tu-dresden.de>
Author
Parents
Loading