Leverage mmap for offloading tensors to GPU (#1597)
* Rebase to latest
* Show progress
* Add assert to make sure we only allocate temp buffer for non-CPU backend tensor
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>