Porting legacy reflection_pad2d to ATen
Summary:
Other changes:
1. Avoided using `THCDeviceTensor` by re-calculating the mapping from cuda (blockIdx, threadIdx) to input/output tensor index.
2. Changed Camelcase naming to underscore naming.
Differential Revision: D13546803
fbshipit-source-id: 1df54f13e64934da3d803d9b6586bd5208d42d6d