onnxruntime
a0c42367 - [NV TensorRT RTX EP] enable weight stripped engines with EP Context (#24869)

Commit

311 days ago

[NV TensorRT RTX EP] enable weight stripped engines with EP Context (#24869) Enable NV TRT RTX EP engines to be weight stripped always when using EP Context We want to always use weight-stripped engines for EP Context to reduce disk footprint on end-user system. With this, there are two ways to load weights 1. provide weights via bytestream (recommended) 2. original `model.onnx` present in the same folder as the `model_ctx.onnx` ```cpp std::vector<char> model_bytes = ReadFileFromDisk("model.onnx"); // weight refitting using bytesteam std::unordered_map<std::string, std::string> rtx_ep_options; rtx_ep_options[onnxruntime::nv::provider_option_names::kONNXBytestream] = std::to_string(reinterpret_cast<size_t>(model_bytes.data())); rtx_ep_options[onnxruntime::nv::provider_option_names::kONNXBytestreamSize] = std::to_string(model_bytes.size()); ```

References

#24869 - [NV TensorRT RTX EP] enable weight stripped engines with EP Context

Author

thevishalagarwal

Parents

b7b1af43

onnxruntime a0c42367 - [NV TensorRT RTX EP] enable weight stripped engines with EP Context (#24869)

onnxruntime
a0c42367 - [NV TensorRT RTX EP] enable weight stripped engines with EP Context (#24869)