Fix PackedGemmMatrixFP16 repacking (#43320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43320
Previous impl seem to be buggy although I don't why. New impl is copied from https://fburl.com/diffusion/cing6mxv
Reviewed By: jianyuh
Differential Revision: D23235964
fbshipit-source-id: 780b6e388ef895232e3ba34b125c2492b1cee60c