Allow instruction counting to use shared memory as a staging ground. (And a couple other tweaks.) (#56711)
Summary:
This is actually something I discovered a while ago with the wall of serotonin. It was really easy for large scale runs to get bottlenecked on disk access. I have a hack in the working files of that machine to use `/dev/shm`, but I figured I should formalize and actually make a respectable utility.
I also added a param to tweak the run cadence and print when a CorePool is created; these are just to make the CI logs a bit nicer. (A printout each second on a 40 minute CI job is a bit much...)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56711
Reviewed By: agolynski
Differential Revision: D28392248
Pulled By: robieta
fbshipit-source-id: b6aa7445c488d8e4ab9d4b31ab18df4e12783d8f