[Reformer] - Cache hidden states and buckets to speed up inference (#5578)
* fix merge rebase
* add intermediate reformer code
* save intermediate caching results
* save intermediate
* save intermediate results
* save intermediate
* upload next step
* fix generate tests
* make tests work
* add named tuple output
* Apply suggestions from code review
* fix use_cache for False case
* fix tensor to gpu
* fix tensor to gpu
* refactor
* refactor and make style