caffe2: use at::mt19937 instead of std::mt19937 (10x speedup) (#43987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43987
This replaces the caffe2 CPU random number (std::mt19937) with at::mt19937 which is the one currently used in pytorch. The ATen RNG is 10x faster than the std one and appears to be more robust given bugs in the std (https://fburl.com/diffusion/uhro7lqb)
For large embedding tables (10GB+) we see UniformFillOp taking upwards of 10 minutes as we're bottlenecked on the single threaded RNG. Swapping to at::mt19937 cuts that time to 10% of the current.
Test Plan: Ran all relevant tests + CI. This doesn't introduce new features (+ is a core change) so existing tests+CI should be sufficient to catch regressions.
Reviewed By: dzhulgakov
Differential Revision: D23219710
fbshipit-source-id: bd16ed6415b2933e047bcb283a013d47fb395814