Fix a bug in the implementation of dequantization for inference (#3433)
* bugfix in launch_dequantize()
Get rid of `hid_cnt` and simply set #blocks to output size / #groups
* add a unit test for dequantization
---------
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>