refine gguf rtn algorithm and fix bugs (#630)
* change to float32 as it's quite important
* fix bug
* support for customized embedding and lm-head bits
* Update auto_round/utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix line too long issue
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>