[nnc] Remove cached argv from LLVMCodeGen to fix race condition (#54286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54286
A generated code object was holding not just a function pointer but a
pre-allocated argument buffer. I assume this was a performance optimization to
avoid allocating a vector on each call?
This cached buffer makes it unsafe to call a generated function from multiple
threads, which is too severe a limitation. This diff fixes it by locally
allocating a SmallVector to hold the args.
A better fix will be to avoid creating CallArgs, so the function can be called
directly without this packing-and-unpacking nonsense, but that's a slightly
more involved fix, possibly involving changing the kernel codegen, and this bug
needs fixing now.
ghstack-source-id: 124333028
Test Plan: `threads=64 scripts/bwasti/static_runtime/run.sh`
Reviewed By: asuhan
Differential Revision: D27175715
fbshipit-source-id: 44dafe77b95ede69c63ae6d64f39f0aa4877712f