Add dynamo bench arg --per_process_memory_fraction (#95260)
Simply pipes the arg to the existing torch.cuda API by the same name.
Useful for locally debugging OOMs that happened on a smaller GPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95260
Approved by: https://github.com/davidberard98