Teach Triton codegen to generate sqrt (#103084)
Fixes https://github.com/pytorch/pytorch/issues/100972
I know ngimel doesn't like this sort of fix because we shouldn't
actually be computed sqrt at runtime, I'm open to some sort of
perf warning saying that we're spending FLOPs weirdly.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103084
Approved by: https://github.com/albanD, https://github.com/Skylion007, https://github.com/ngimel