llama.cpp
6a2f0b34 - Implement non-mapped async IO for CUDA on Windows. (#7896)

Commit

1 year ago

Implement non-mapped async IO for CUDA on Windows. (#7896) * Implement non-mapped async IO for CUDA on Windows. On a fast Gen5 NVMe drive this change improves model load time by >3x while it should be the same (or slightly faster) on any other drive. * Free resources except for backend. * Change assertions to exceptions in llama_file, find correct cuda backend to create CUDA resources and respect the use_mmap flag again for CUDA. * Apply suggestions from code review Co-authored-by: slaren <slarengh@gmail.com> * Fix editorconfig and unused variable * Fix issues with Windows build --------- Co-authored-by: slaren <slarengh@gmail.com>

References

#7896 - Implement non-mapped async IO for CUDA on Windows.

Author

mtavenrath

Parents

21be9cab

llama.cpp 6a2f0b34 - Implement non-mapped async IO for CUDA on Windows. (#7896)

llama.cpp
6a2f0b34 - Implement non-mapped async IO for CUDA on Windows. (#7896)