pytorch
6f396e18 - Add per-device allocator object in CUDACachingAllocator (#37567)

Commit View On GitHub

Commit

4 years ago

Add per-device allocator object in CUDACachingAllocator (#37567) Summary: Reduces lock contention and BlockPool management costs by tracking applicable state in per-device structures. `THCCachingAllocator` now maintains a set of `DeviceCachingAllocator` objects (one per device) each of which maintains its own allocator state and operations. Only global state remains in the top-level THCCachingAllocator object -- namely, `allocated_blocks`, the mapping between the raw storage pointers and the allocator's underlying Block structure. Global operations deal mostly with this translation and then pass the bulk of the work on to the device-specific allocator. Conversely, device-specific state and operations are comprised mostly of managing the device's underlying blocks. This has the following benefits: - Performance: Access to the global pointer map is serialized independently of the per-device state -- reducing lock contention between operations on different devices. - Simplicity: Managing the block pools in separate device-specific objects is conceptually more intuitive, simplifies the code and makes certain operations more efficient -- even in the absence of contention (e.g. free_cached_blocks, synchronize_and_free_events, emptyCache, get_all_blocks, etc.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/37567 Differential Revision: D21458556 Pulled By: colesbury fbshipit-source-id: ef56cb373797b180df72f0998ebc35972c892288

Author

mtbrandy

Committer

facebook-github-bot

Parents

324dc162

pytorch 6f396e18 - Add per-device allocator object in CUDACachingAllocator (#37567)

Commit

pytorch
6f396e18 - Add per-device allocator object in CUDACachingAllocator (#37567)