[ci] two procs for parallelization (#85985)
hitting ooms on linux cuda so use 2 procs instead of 3
https://github.com/pytorch/pytorch/issues/85939
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85985
Approved by: https://github.com/huydhn