Remove unnecessary allocations in `processErrorMsg`
`processErrorMsg` uses a table of string constants to be replaced in
the error message. However, this table is non-static so gets
re-constructed from scratch every time. So, I've made it `constexpr`
by using `std::array` instead of `std::vector` and `c10::string_view`
instead of `std::string`.
To support `c10::string_view` I've also updated `c10::ReplaceAll` to
accept string_view arguments, and added a fast-path that also avoids
repeated string searches when no translation is needed.
Using `torch.floor_divide` to benchmark a python warning, I see the
callgrind instruction count fall from 3,806,446 to 703,168 and a 6.5
us time improvement using `%timeit`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76976
Approved by: https://github.com/swolchok