[Inductor] Convert 0d CPU tensor to scalar during triton codegen (#87329)
This is a follow up to address [this](https://github.com/pytorch/torchdynamo/pull/1284#pullrequestreview-1130319129). We revised to use the codegen approach to handle 0d CPU tensor, which will not support cudagraph any more.
cc @jansel @lezcano @fdrocha
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87329
Approved by: https://github.com/ngimel