Fixed use of memories in XLNet (caching for language generation + warning when loading improper memoryless model) (#5632)
* Pytorch gpu => cpu proper device
* Memoryless XLNet warning + fixed memories during generation
* Revert "Pytorch gpu => cpu proper device"
This reverts commit 93489b36
* made black happy
* TF generation with memories
* dim => axis
* added padding_text to TF XL models
* Added comment, added TF