Fix Inductor CSE Across Separate Reductions (#119410)
We were CSE'ing a load across two separate reduction loop bodies. This is because we were examining an indirect indexing that did not have an explicit rindex in its load. I've commented with more details and other potentials on the fix.
Tried using minifier unsuccessfully and hand minified some but could do more..
Fix for https://github.com/pytorch/pytorch/issues/119327
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119410
Approved by: https://github.com/shunting314, https://github.com/jansel