[clang][CodeGen] Fix sub-optimal clang CodeGen for __atomic_test_and_set (#160098)
Clang CodeGen for `__atomic_test_and_set` would emit a `store`
instruction that stores an `i1` value:
```cpp
bool f(void *ptr) {
return __atomic_test_and_set(ptr, __ATOMIC_RELAXED);
}
```
```llvm
%1 = atomicrmw xchg ptr %0, i8 1 monotonic, align 1
%tobool = icmp ne i8 %1, 0
store i1 %tobool, ptr %atomic-temp, align 1
```
which could lead to suboptimal binary code, for example on x86_64:
```asm
f:
mov al, 1
xchg byte ptr [rdi], al
test al, al
setne al
setne byte ptr [rsp - 1]
ret
```
The last `setne` instruction is obviously redundant. This patch fixes
this issue by first zero-extending `%tobool` to an `i8` before the
store. This effectively eliminates the last `setne` instruction in the
binary code sequence. The `test` and `setne` on `al` is kept still,
though.
-----
I'm quite conservative about the codegen in this patch. Vanilla gcc
actually emits simpler code for `__atomic_test_and_set`:
```cpp
bool f(void *ptr) {
return __atomic_test_and_set(ptr, __ATOMIC_RELAXED);
}
```
```asm
f:
mov eax, 1
xchg al, BYTE PTR [rdi]
ret
```
It seems like gcc assumes `ptr` would always point to a valid `bool`
value as required by the ABI. I'm not sure if we should also make this
assumption.
Related to #121943 .