onnxruntime
c5e6bd8d - Fix string tensor deserialization in ORT format models (#28133)

Commit
2 days ago
Fix string tensor deserialization in ORT format models (#28133) ### Description `ConvertInitializersIntoOrtValues()` replaces initializer TensorProtos with ones pointing to in-memory raw buffers via `TensorToTensorProto(..., use_tensor_buffer=true)`. For string tensors exceeding 127 bytes, this stores a pointer to `std::string` C++ objects as "external data"—but those objects contain heap pointers, not serializable content. The `string_data` field ends up empty, so ORT format save loses all string data. On reload: shape says N elements, `string_data_size()` is 0 → deserialization fails. Changes: - **`tensorprotoutils.cc`**: Add `!tensor.IsDataTypeString()` guard in `TensorToTensorProto` so string tensors always populate `string_data` rather than taking the external-data-in-memory path - **`graph.cc`**: Skip string tensors in `ConvertInitializersIntoOrtValues()` since the raw-buffer optimization is fundamentally incompatible with string data - **`graph_test.cc`**: Add regression test creating a 20-element string initializer, calling `ConvertInitializersIntoOrtValues()`, and verifying string data survives ### Motivation and Context Since onnxruntime 1.23.0, loading ORT format models with string tensor initializers fails with: ``` INVALID_ARGUMENT: Deserialize tensor failed. UnpackTensor: the pre-allocate size does not match the size in proto ``` Reproduction: any model with a string initializer (e.g. Gather over a string array) saved via `optimized_model_filepath` with `.ort` extension, then reloaded. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Author
Parents
Loading