auto-round
fix critic bug for gradient_accumulate_steps!=1 and reduce cpu memory of lm-head tuning
#97
Merged

Loading