llama.cpp
Implement non-mapped async IO for CUDA on Windows.
#7896

Merged

Implement non-mapped async IO for CUDA on Windows. #7896

slaren merged 7 commits into ggml-org:master from mtavenrath:direct_io

Implement non-mapped async IO for CUDA on Windows. On a fast Gen5 NVM…

32dd2ef1

mofosyne added Review Complexity : Low

Free resources except for backend.

1ebe2078

slaren commented on 2024-06-12

Change assertions to exceptions in llama_file, find correct cuda back…

86869fbd

slaren commented on 2024-06-13

Apply suggestions from code review

c39d5ecd

Fix editorconfig and unused variable

d3131ce5

Merge remote-tracking branch 'origin/master' into direct_io

45c483ce

Fix issues with Windows build

f4d33f87

slaren approved these changes on 2024-06-13

slaren merged 6a2f0b34 into master 1 year ago

Reviewers

slaren

Assignees

No one assigned

Labels

Review Complexity : Low

Milestone

No milestone