[SR] Fix stack/concat bug (#68777)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68777
Fixed some cases where negative dimensions were not handled correctly
* `_stack_cpu` calls `maybe_wrap_dim`, but `_stack_cpu_out` does not. This is only problematic when `_stack_cpu_out` forwards to the serial kernel: [ref](https://www.internalfb.com/code/fbsource/[1b5af978b48f2e5d308d42b588bde3275869a57b]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=1541-1547).
* concat also needs to wrap its dim
Test Plan:
`buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`
Added new tests to cover this case
Reviewed By: hlu1
Differential Revision: D32604623
fbshipit-source-id: 00aaa42817cd2d3e7606ce75ab5a9744645118cf