Fix the FP6 kernels compilation problem on non-Ampere GPUs. (#5333)
Refine the guards of FP6 kernel compilation. Fix the `undefined symbol`
problem of FP6 kernels on non-Ampere architectures.
Related issue: https://github.com/microsoft/DeepSpeed-MII/issues/443.
---------
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>