Populate the eviction_policy field for load/store properly
This helps with kernels that make use of caching like mid-range softmax
which reads the data three times.
Selecting `eviction_policy=evict_first` in the last loop of the softmax
operation seems to give a 7-10% speed-up vs. selecting `evict_last` which
was the previous option. I'll put up some benchmarks soon™.
ghstack-source-id: 9ca4f962b09967ad972cbae9ab3f610662808c23
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91316