llvm-project
d2239fbf - [clang][CodeGen] Fix sub-optimal clang CodeGen for __atomic_test_and_set (#160098)

Commit
2 days ago
[clang][CodeGen] Fix sub-optimal clang CodeGen for __atomic_test_and_set (#160098) Clang CodeGen for `__atomic_test_and_set` would emit a `store` instruction that stores an `i1` value: ```cpp bool f(void *ptr) { return __atomic_test_and_set(ptr, __ATOMIC_RELAXED); } ``` ```llvm %1 = atomicrmw xchg ptr %0, i8 1 monotonic, align 1 %tobool = icmp ne i8 %1, 0 store i1 %tobool, ptr %atomic-temp, align 1 ``` which could lead to suboptimal binary code, for example on x86_64: ```asm f: mov al, 1 xchg byte ptr [rdi], al test al, al setne al setne byte ptr [rsp - 1] ret ``` The last `setne` instruction is obviously redundant. This patch fixes this issue by first zero-extending `%tobool` to an `i8` before the store. This effectively eliminates the last `setne` instruction in the binary code sequence. The `test` and `setne` on `al` is kept still, though. ----- I'm quite conservative about the codegen in this patch. Vanilla gcc actually emits simpler code for `__atomic_test_and_set`: ```cpp bool f(void *ptr) { return __atomic_test_and_set(ptr, __ATOMIC_RELAXED); } ``` ```asm f: mov eax, 1 xchg al, BYTE PTR [rdi] ret ``` It seems like gcc assumes `ptr` would always point to a valid `bool` value as required by the ABI. I'm not sure if we should also make this assumption. Related to #121943 .
Author
Parents
Loading