Add a .with_cache() method to distributions.Transform objects (#36882)
Summary:
This resolves an issue observed by stefanwebb where the composition of multiple transforms is cached only if all components are cached.
This PR adds a new method `.with_cache()` so that e.g. you can compose a normalizing flow (that needs to be cached) with a `SigmoidTransform` (that wasn't already cached) by calling `.with_cache()` on the latter. This issue also comes up when composing non-cached constraint transforms as returned by `transform_to()` and `biject_to()`: after this PR you can call `transform_to(constraints.positive).with_cache()` to get a cached `ExpTransform`.
## Tested
- [x] added a unit test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36882
Differential Revision: D21155914
Pulled By: ezyang
fbshipit-source-id: 3c06e63785ca2503e08a5cd7532aff81882835e9