Add check for no grad in transformer encoder nestedtensor conversion (#78832)
Before, we allowed inputs with grad to be converted to NestedTensors. Autograd attempts to find the size of the NestedTensor, but NestedTensor throws an exception for its size function. This causes all calls to nn.TransformerEncoder with grad enabled to fail.
Fix: we add a check for no grad in transformer encoder so we do not convert tensor with grad to nestedtensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78832
Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser