Lower batch on cait_m36_384 (#106091)
The memory compression for this model is 0.9839, but we OOM w cudagraphs because we interleave the eager runs with cudagraph so it duplicates the memory bc of cudagraph memory pool.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106091
Approved by: https://github.com/anijain2305