[fbgemm] remove assumption number of rows is in 32 bit (#69066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69066
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/781
And remove unnecessary looping inside parallel_for despite fbgemm routines support batching multiple rows
Test Plan: CI
Reviewed By: dskhudia, jianyuh
Differential Revision: D32715453
fbshipit-source-id: 33c3e72f51c8ff5d02dafab4a8947d1230c2d551