Small fixes for better channels last performance (#89616)
1) don't codegen maxpool backward, it's exceedingly slow
2) better determine reduction variables for more accurate hints
3) deterministic iteration order for reduction arguments, take into account all full size reduction argument, for hints break ties to outer reduction
fixes #1653
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89616
Approved by: https://github.com/jansel, https://github.com/Chillee