Use proper grad mode during model validation to avoid OOM (#168176)
Summary:
Flux image generation pipeline takes about 32G. Our benchmark suite
often does a deepcopy, which adds another 32G. And if we run with
`requires_grad=True` (the default for HF models), we'll OOM on a A100
80G.
Note that we already use `pick_grad` in most other places, so this patch
simply improves that coverage.
X-link: https://github.com/pytorch/pytorch/pull/168176
Approved by: https://github.com/Lucaskabela
Reviewed By: georgehong
Differential Revision: D98008600
fbshipit-source-id: e47804cf591f839877a0086bc4f9d8dcf6c2d044
Author
generatedunixname499836121