Improve perf by avoiding implicit string creation in c10_cuda_check_implementation (#88350)
Test Plan: Sandcastle
Differential Revision: D40949947
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88350
Approved by: https://github.com/Skylion007, https://github.com/soumith