[profiler] use swap in allocBlock to reduce time the lock is held. (#34499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34499
RangeEventList::allocBlock currently iterates through `blocks`, which
we serialize access to and accumulates them into `result`. Instead of doing
this, we can swap with an empty `forward_list` in constant time, and then
unlock, and use this local list in order to populate `result`.
ghstack-source-id: 100426115
Test Plan: existing profiler tests pass
Differential Revision: D20346423
fbshipit-source-id: 0e567b56049daa371051ccec6c5d1630a92db15f