[caffe2] Micro-optimizations in BlobGetMutableTensor (#98103)
Make sure we don't call Tensor::GetDevice() twice. Remove redundant branch for the case when tensor->dtype() == options.dtype(); in this case we end up calling raw_mutable_data(options.dtype()) anyway!
Differential Revision: [D44596695](https://our.internmc.facebook.com/intern/diff/D44596695/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98103
Approved by: https://github.com/jerryzh168