codegen: emit load+store for small scalar memcpy
For small scalar copies (1, 2, 4, 8 bytes), emit a load+store pair
instead of memcpy. This preserves the source and destination TBAA tags
independently, whereas memcpy merges them to their common ancestor
via getMostGenericTBAA. The merge can produce overly generic tags like
jtbaa_data (from merging jtbaa_Float64 under jtbaa_value with
jtbaa_arraybuf_Float64 under jtbaa_arraybuf), which prevents LLVM from
disambiguating array element stores from array metadata loads.
Multi-field structs still use memcpy with !tbaa.struct metadata.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>