Fix memory leak for pipelined optimizer swapper (#5700)
We identified a memory leak when training with NVMe offloaded optimizer
states. The issue occurs when `pipeline_write=true`, where the tensors
that have swapped out and written to NVMe are not deallocated, leading
to a memory leak.
This PR resolves the issue by deallocating the unused tensors which have
swapped out to NVMe.
Co-authored-by: amaurya <am6429@cs.rit.edu>