[storage][perf] Reduce _get_device_from_module overhead. (#119144)
Using `rsplit` with maxsplit=1 is more efficient since it 1) stops traversal as soon as the first `.` from the right side is encountered 2) creates no more than 2-element list
This change also reuses `last_part` to avoid unnecessary repetition of a split.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119144
Approved by: https://github.com/Skylion007, https://github.com/mikaylagawarecki