Support fwd_no_grad mode
Summary:
Some Triton operators have different perf in no_grad mode, since they
can avoid saving intermediate results. We want to be able to benchmark them
both ways.
Reviewed By: xuzhao9
Differential Revision: D62304098
fbshipit-source-id: 4cc3e6163596fa16570ebccdea38acc519fb5a91