Make CUDA serde support for TP agent pluggable (#59376)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59376
This is an experiment. The end goal is to separate the CUDA-specific aspects of the TensorPipe agent so that they can be plugged "on top" of the CPU-only parts. This will then allow to move the TP agent to libtorch (because libtorch is split into a CPU and a CUDA part; now it's in libtorch_python), although unfortunately other conditions need to also be met for this to happen.
The only instance where we had CPU and CUDA logic within the same code, guarded by `#ifdef USE_CUDA`, is the serialization/deserialization code. I'm thus introducing a sort-of registry in order to "decentralize it". It's not a c10::Registry, because that's overkill (it uses an unordered_map, with strings as keys): here we can just use an array with integers as "keys".
ghstack-source-id: 131326167
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D28796428
fbshipit-source-id: b52df832e0c0abf489a9e418353103496382ea41