Fix with emit_nvtx, also allow shape information to appear in nvtx ranges. (#21691)
Summary:
This PR is intended as a fix for https://github.com/pytorch/pytorch/issues/21644.
It allows the `with emit_nvtx` context manager to take an additional `record_shapes` argument. `record_shapes` is False by default, but if True, the nvtx ranges generated for each autograd op will append additional information about the sizes of Tensors received by that op.
The format of shape information is equivalent to what the CPU-side profiler spits out. For example,
```
M = torch.randn(2, 3)
mat1 = torch.randn(2, 3)
mat2 = torch.randn(3, 3)
with torch.cuda.profiler.profile():
with torch.autograd.profiler.emit_nvtx(record_shapes=True):
torch.addmm(M, mat1, mat2)
```
produces the following nvtx range label for addmm:
![Screenshot from 2019-06-12 10-48-01](https://user-images.githubusercontent.com/7799218/59374008-b7d13100-8cff-11e9-9245-58410073d965.png)
(cf the "Input Shapes" shown in https://github.com/pytorch/pytorch/commit/864cfbc2162a874fd67b50414205483cb9ac6b5d#diff-115b6d48fa8c0ff33fa94b8fce8877b6)
I also took the opportunity to do some minor docstring cleanup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21691
Differential Revision: D15816226
Pulled By: gchanan
fbshipit-source-id: b2b01ea10fea61a6409a32b41e85b6c8b4851bed