[Pytorch] Fix embedding bag bug accessing unaligned memory (#53300)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53300
Float scale and bias are packed as per row parameters at the end of each row.
This takes 8 bytes. However if the number of elements in row are such that end
of row address is unaligned for float, not multiply of 4 bytes, we will get
unaglined memory access.
Current solution is inefficient, so this should really be fixed at weight
packing time.
It seems that longer term there will be prepack function that packs weights. So
this fallback path should eventually match that and not store scale and bias
inline.
Test Plan: python test/test_quantization.py
Reviewed By: pengtxiafb
Differential Revision: D26828077
fbshipit-source-id: 8512cd95f3ac3ca53e1048139a9f6e19aa8af298