[allocator] register oom observers on every device (#107399)
This change is to match the behavior of _record_memory_history which was
recently changed to enable history recording on all devices rather than
the current one. It prevents confusing situations where the observer
was registered before the device was set for the training run.
It also ensures the allocators have been initialized in the python binding just in case this is the first call to the CUDA API.
Fixes #107330
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107399
Approved by: https://github.com/eellison
ghstack dependencies: #107171