[inductor] Fix shape padding (#99917)
Summary:
We were using the "percentiles" form of triton.testing.do_bench, which
returns a list of like (20th, 50th, 80th) percentile timing; I don't think we
care about that much detail, so let's just use the mean. I also took the
opportunity to clean up the redundant setting of rep, warmup, and fast_flush.
Test Plan:
```
TORCHBENCH_ATOL=1e-3 TORCHBENCH_RTOL=1e-3 TORCHINDUCTOR_PERMUTE_FUSION=1 TORCHINDUCTOR_SHAPE_PADDING=1 buck2 run mode/opt mode/inplace pytorch/benchmark:run -- ads_dhen_5x --part over --bs 1024 -d cuda -t train --torchdynamo inductor
```
Reviewed By: jiawenliu64
Differential Revision: D45241751
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99917
Approved by: https://github.com/jiawenliu64