llama : add `use_direct_io` flag for model loading (#18166)
* Adding --direct-io flag for model loading
* Fixing read_raw() calls
* Fixing Windows read_raw_at
* Changing type off_t to size_t for windows and Renaming functions
* disable direct io when mmap is explicitly enabled
* Use read_raw_unsafe when upload_backend is available, not functional on some devices with Vulkan and SYCL
* Fallback to std::fread in case O_DIRECT fails due to bad address
* Windows: remove const keywords and unused functions
* Update src/llama-mmap.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: jtischbein <jtischbein@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>