[mypyc] Enable SIMD for librt.base64 on x86-64 (#20244)
Also generally enable SSE4.2 instructions when targeting x86-64. These
have been supported by hardware since ~2010, so it seems fine to require
them now.
This speeds up `b64encode` by up to 100% on Linux running on a recent
AMD CPU.
Some fairly recent hardware doesn't support AVX2, so it's not enabled.
We'd probably need to rely on hardware capability checking for AVX2
support, and we'd need compile different files with different
architecture flags probably, and I didn't want to go there (at least not
yet).