[Memory Snapshot] Add CUDAAllocatorConfig details into snapshot metadata (#119404)
Summary:
Include the CUDAAllocatorConfig at the time of snapshot into the snapshot file. These include adding variables:
```
double garbage_collection_threshold;
size_t max_split_size;
size_t pinned_num_register_threads;
bool expandable_segments;
bool release_lock_on_cudamalloc;
bool pinned_use_cuda_host_register;
std::string last_allocator_settings;
std::vector<size_t> roundup_power2_divisions;
```
Test Plan:
`PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True ` produces
```
{'PYTORCH_CUDA_ALLOC_CONF': 'expandable_segments:True',
'max_split_size': -1,
'garbage_collection_threshold': 0.0,
'expandable_segments': True,
'pinned_num_register_threads': 1,
'release_lock_on_cudamalloc': False,
'pinned_use_cuda_host_register': False,
'roundup_power2_divisions': {'1': 0,
'2': 0,
'4': 0,
'8': 0,
'16': 0,
'32': 0,
'64': 0,
'128': 0,
'256': 0,
'512': 0,
'1024': 0,
'2048': 0,
'4096': 0,
'8192': 0,
'16384': 0,
'32768': 0}}
```
`PYTORCH_CUDA_ALLOC_CONF="max_split_size_mb:2000,roundup_power2_divisions:[256:1,512:2,1024:4,>:8]"` produces
```
{'PYTORCH_CUDA_ALLOC_CONF': 'max_split_size_mb:2000,roundup_power2_divisions:[256:1,512:2,1024:4,>:8]',
'max_split_size': 2097152000,
'garbage_collection_threshold': 0.0,
'expandable_segments': False,
'pinned_num_register_threads': 1,
'release_lock_on_cudamalloc': False,
'pinned_use_cuda_host_register': False,
'roundup_power2_divisions': {'1': 1, '2': 1, '4': 1, '8': 1, '16': 1, '32': 1, '64': 1, '128': 1, '256': 1, '512': 2, '1024': 8, '2048': 8, '4096': 8, '8192': 8, '16384': 8, '32768': 8}
}
```
Differential Revision: D53536199
Pulled By: aaronenyeshi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119404
Approved by: https://github.com/zdevito