CSR+FP32 fix (#206)
1) CSR parameter names should end with .weight.
2) When using basic optimizer directly, DeepSpeed should handle zero_grad. Letting the basic optimizer do the zero_grad resulted in residual gradients in the embedding layer due to unknown reasons.