[Clang] [Lexer] Detect SSE4.2 availability at runtime in fastParseASCIIIdentifier (#171914)
This change attempts to maximize usage of the SSE fast path in
`fastParseASCIIIdentifier`.
If compiling for x86, we compile both the SSE fast path and the scalar loop. At
runtime, we check if SSE4.2 is available and dispatch to the right
function by using the `target` attribute. If it _is_ available, this
allows a net performance improvement. Otherwise, there's a very slight
but negligible regression... I believe that's perfectly reasonable for a
non-SSE4.2-supporting processor.
If we are not compiling for x86, then the behavior is the exact same, ensuring we have
no regressions. If the binary is compiled for x86 with SSE4.2 enabled, we still do a runtime check, but this has negligible impact ; furthermore, the point of the PR is that this is rarely the case.
The benchmark results are available at
[llvm-compile-time-tracker](https://llvm-compile-time-tracker.com/compare.php?from=f88d060c4176d17df56587a083944637ca865cb3&to=d5485438edd460892bf210916827e0d92fc24065&stat=instructions%3Au).