DeepSpeed
3c3c68f7 - fix: close file descriptor in deepspeed_io_handle_t::wait() to prevent fd leak (#8075)

Commit
14 hours ago
fix: close file descriptor in deepspeed_io_handle_t::wait() to prevent fd leak (#8075) ## Overview This PR addresses a file descriptor leak in `deepspeed_io_handle_t::wait()` by ensuring the file descriptor is properly closed after the async I/O operation completes. ## Changes * Added `close()` call on the file descriptor at the end of `deepspeed_io_handle_t::wait()` to prevent fd accumulation during repeated async I/O operations. * This prevents potential resource exhaustion in long-running training jobs that perform frequent checkpoint reads/writes via DeepSpeed's async I/O interface with ZeRO3 offload NVMe. Signed-off-by: markcl_chang <markcl_chang@adata.com>
Author
Parents
Loading