[turbopack] optimize the TurboMalloc threadlocals (#80265)
Improve memory tracking in turbo-tasks-malloc
### What?
Made the thread_local initializer `const` compatible. This skips per-thread lazy initialization of the threadlocal datastructure which allows us to skip a small amount of bootstrapping logic.
Additionally, this allows us to access the native llvm implementation of threadlocals which can improve performance. See https://matklad.github.io/2020/10/03/fast-thread-locals-in-rust.html
### Performance
Confusingly, the benchmarks are not positive, but vercel-site appears to be. Perhaps the 'small' benchmarks are just too small/
