julia
5a904ac9 - Enable per thread register state cache on libunwind (#55049)

Commit

1 year ago

Enable per thread register state cache on libunwind (#55049) Looking into a profile recently I realized that when recording backtraces the CPU utilization is mostly dominated by lookups/updates to libunwind's register state cache (`get_rs_cache`, `put_rs_cache`): ![Screenshot from 2024-07-05 19-29-45](https://github.com/JuliaLang/julia/assets/5301739/5e65f867-6dc8-4d55-8669-aaf1f756a2ac) It is also worth noting that those functions are taking a lock and using `sigprocmask` which does not scale, so by recording backtraces in parallel we get: ![Screenshot from 2024-07-05 19-30-21](https://github.com/JuliaLang/julia/assets/5301739/ed3124dd-f340-4b52-a7f9-c0a203f935b6) And this translates to these times on a recent laptop (Linux X86_64): ``` julia> @time for i in 1:1000000 Base.backtrace() end 8.286924 seconds (32.00 M allocations: 8.389 GiB, 1.46% gc time) julia> @time Threads.@sync for i in 1:16 Threads.@spawn for j in 1:1000000 Base.backtrace() end end 20.448630 seconds (160.01 M allocations: 123.740 GiB, 8.05% gc time, 0.43% compilation time: 18% of which was recompilation) ``` Good news is that libunwind already has the solution for this in the form of the `--enable-per-thread-cache` build option which uses a thread local cache for register state instead of the default global one ([1](https://libunwind-devel.nongnu.narkive.com/V3gtFUL9/question-about-performance-of-threaded-access-in-libunwind)). But this is not without some hiccups due to how we `dlopen` libunwind so we need a small patch ([2](https://libunwind-devel.nongnu.narkive.com/QG1K3Uke/tls-model-initial-exec-attribute-prevents-dynamic-loading-of-libunwind-via-dlopen)). By applying those changes we get: ``` julia> @time for i in 1:1000000 Base.backtrace() end 2.378070 seconds (32.00 M allocations: 8.389 GiB, 4.72% gc time) julia> @time Threads.@sync for i in 1:16 Threads.@spawn for j in 1:1000000 Base.backtrace() end end 3.657772 seconds (160.01 M allocations: 123.740 GiB, 52.05% gc time, 2.33% compilation time: 19% of which was recompilation) ``` Single-Threaded: ![Screenshot from 2024-07-05 20-25-49](https://github.com/JuliaLang/julia/assets/5301739/ebc87952-e51f-488c-92f4-72aed5abb93a) Multi-Threaded: ![Screenshot from 2024-07-05 20-26-32](https://github.com/JuliaLang/julia/assets/5301739/0ea2160a-60e8-49ea-af62-7d8ffc35c963) As a companion to this PR I have created another one for applying the same change to LibUnwind_jll [on Yggdrasil](https://github.com/JuliaPackaging/Yggdrasil/pull/9030). After that lands we can bump the version here.

References

#55049 - Enable per thread register state cache on libunwind

Author

andrebsguedes

Parents

fdecc597

julia 5a904ac9 - Enable per thread register state cache on libunwind (#55049)

julia
5a904ac9 - Enable per thread register state cache on libunwind (#55049)