[PyTorch] Optimize ~intrusive_ptr for the case of zero weak references (#47834)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47834
We can determine if (as is likely) there are no outstanding
weak references without bothering to decrement the
count. `std::shared_ptr` does this same optimization in libc++:
https://github.com/llvm/llvm-project/blob/229db3647491ed2b2706a9b9ce13a97e38be6fa0/libcxx/src/memory.cpp#L69-L107
ghstack-source-id: 116576326
Test Plan:
Saw time spent in TensorImpl::release_resources drop in
local profiling of empty benchmark
Run framework overhead benchmarks. 9-10% savings on OutOfPlace, small single digit savings on empty, essentially none on InPlace.
Reviewed By: bhosmer
Differential Revision: D24914763
fbshipit-source-id: 19b03f960e32123bc72f7edce63fa1d18c3c143f