onnxruntime
8da5e910 - Use abseil for readable POSIX stack traces in debug builds (#28405)

Commit
2 days ago
Use abseil for readable POSIX stack traces in debug builds (#28405) ## Description Replace glibc `backtrace()`/`backtrace_symbols()` with abseil's `absl::GetStackTrace()`/`absl::Symbolize()` for POSIX/Linux debug builds, and add automatic `addr2line` resolution for file paths and line numbers. The previous implementation produced raw addresses requiring manual `addr2line` translation. The new implementation produces demangled function names with source locations directly in exception messages, with zero new dependencies. ## Summary of Changes ### Stack Trace Implementation | File | Change | |------|--------| | `onnxruntime/core/platform/posix/stacktrace.cc` | Replace glibc `backtrace()`/`backtrace_symbols()` with `absl::GetStackTrace()`/`absl::Symbolize()`. Use `dladdr()` + `addr2line` to resolve source file and line number for each frame. | | `onnxruntime/core/session/environment.cc` | Add one-time `absl::InitializeSymbolizer(nullptr)` call via `std::call_once` in `Environment::Initialize()`. On Linux, `nullptr` works because abseil reads `/proc/self/exe`. | ### Before vs After **Before** (raw addresses requiring manual `addr2line`): ``` Stacktrace: /home/me/build/Debug/onnxruntime_test_all(+0x3f46cc) [0x559543faf6cc] /home/me/build/Debug/onnxruntime_test_all(+0x2bef04d) [0x559543faf6cc] ... ``` **After** (demangled function names + file:line): ``` Stacktrace: onnxruntime::OpKernelContext::Output() at .../core/framework/op_kernel.cc:45 onnxruntime::Add<>::Compute() at .../core/providers/cpu/math/element_wise_ops.cc:596 ... ``` Environment variable `ORT_ADDR2LINE` controls number of frames need to call addr2line to get file and location. The default value is 0, which avoids timeout in CI pipeline. In local debugging, you can set a proper value to assist debugging in Linux or WSL. ## Motivation and Context Follow-up on #26257, which was closed because abseil's backtrace/symbolize is already available as a dependency. This PR implements that suggestion with additional file:line resolution: - **No new dependency**: `absl::stacktrace` and `absl::symbolize` are already in `ABSEIL_LIBS` and linked to `onnxruntime_common`. `dladdr()` and `addr2line` are standard POSIX/Linux utilities. - **No CMake changes needed**: Everything is already wired up - **Debug-only**: Guarded by `#ifndef NDEBUG` — no performance impact in release builds - **Best-effort file:line**: Uses `dladdr()` to compute file offsets, then calls `addr2line` in batch (once per binary). Falls back gracefully to function-name-only output if `addr2line` is unavailable. - **Windows unchanged**: Windows already has superior stack traces via C++23 `<stacktrace>` - **Platform exclusions preserved**: Android, WebAssembly, AIX, and `_OPSCHEMA_LIB_` builds continue to return empty stack traces It might also resolve build issues of stacktrace on BSD, Alpine Linux and other musl libc-based distributions (See https://github.com/microsoft/onnxruntime/pull/28249, https://github.com/microsoft/onnxruntime/pull/24755, https://github.com/microsoft/onnxruntime/pull/28161, https://github.com/microsoft/onnxruntime/pull/27437) Note that C++23 has stack trace support, but compiler support for C++23 is not mature, so abseil seems to be the best choice for now. ### How it works 1. `absl::GetStackTrace()` captures raw frame addresses 2. `absl::Symbolize()` resolves each address to a demangled function name 3. `dladdr()` determines which binary each address belongs to and computes the file offset 4. `addr2line` is called in batch (one invocation per binary) to resolve file:line 5. Results are combined into a single readable string per frame ## Testing - Built and verified on Linux with CUDA EP in Debug mode - Ran `onnxruntime_test_all --gtest_filter="*BadModelInvalidDimParamUsage*"` — confirmed stack trace shows demangled function names with file paths and line numbers through the full call chain - Verified graceful fallback when addr2line cannot resolve a frame (shows function name + address only) - No CMake changes, so no risk of build system regressions on other platforms
Author
Parents
Loading