[inductor] Disable cudagraphs if index_put_ fallback is encountered (#105439)
**TL;DR**: if lowerings.py encounters aten.index_put, it will set V.graph.cudagraphs_okay = False, which will disable cudagraphs. index_put needs to be disabled because it crashes cuda graphs.
index_put_ fallbacks fail with cuda graphs when `accumulate=True` - likely for the same reason that it fails with deterministic_algorithms_enabled:
https://github.com/pytorch/pytorch/blob/fcb7d4b35821d97142f1c0d8843dae0e98f8e965/aten/src/ATen/native/TensorAdvancedIndexing.cpp#L730
A first attempt was just to expand the scenarios where `index_put_` is one of the disallowed kernels in utils.py: https://github.com/pytorch/pytorch/blob/2fa7d11b64e55b4935dbf60b7e5810cec992bf67/torch/_inductor/utils.py#L436-L438
However this disables cuda graphs in too many scenarios, because index_put doesn't cause issues if it gets fused, it only causes issues if the aten kernel gets called. So in the updated version of this PR, we check for fallbacks in lowerings.py and disable cudagraphs only if a fallback is encountered there.
Example of failure outside of PT2:
```python
import torch
def fn(x, y, z):
x = torch.zeros_like(x)
return x.index_put_([y], z, True)
# return x + 1
x = torch.zeros((512, 512), dtype=torch.bool, device='cuda')
y = torch.arange(512, dtype=torch.int64, device='cuda')
z = torch.ones((512, 512), dtype=torch.bool, device='cuda')
s = torch.cuda.Stream()
s.wait_stream(torch.cuda.current_stream())
with torch.cuda.stream(s):
for i in range(3):
fn(x, y, z)
torch.cuda.current_stream().wait_stream(s)
g = torch.cuda.CUDAGraph()
with torch.cuda.graph(g):
fn(x, y, z)
```
fails with
```
Traceback (most recent call last):
File "/data/users/dberard/scripts/graphed_index_put.py", line 24, in <module>
fn(x, y, z)
File "/data/users/dberard/scripts/graphed_index_put.py", line 8, in fn
return x.index_put_([y], z, True)
RuntimeError: CUDA error: operation not permitted when stream is capturing
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/users/dberard/scripts/graphed_index_put.py", line 24, in <module>
fn(x, y, z)
File "/data/users/dberard/pytorch/torch/cuda/graphs.py", line 173, in __exit__
self.cuda_graph.capture_end()
File "/data/users/dberard/pytorch/torch/cuda/graphs.py", line 79, in capture_end
super().capture_end()
RuntimeError: CUDA error: operation failed due to a previous error during capture
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
```
Differential Revision: [D47538548](https://our.internmc.facebook.com/intern/diff/D47538548)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105439
Approved by: https://github.com/eellison