Don't use subclass when tracing and call wait_tensor immediately. (#98001)
This change expects that proper scheduling of the wait_tensor call will happen over the traced graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98001
Approved by: https://github.com/wconstab, https://github.com/wanchaol