Fix race in the implementation of __tsan_acquire() (#84923)
`__tsan::Acquire()`, which is called by `__tsan_acquire()`, has a
performance optimization which attempts to avoid acquiring the atomic
variable's mutex if the variable has no associated memory model state.
However, if the atomic variable was recently written to by a
`compare_exchange_weak/strong` on another thread, the memory model state
may be created *after* the atomic variable is updated. This is a data
race, and can cause the thread calling `Acquire()` to not realize that
the atomic variable was previously written to by another thread.
Specifically, if you have code that writes to an atomic variable using
`compare_exchange_weak/strong`, and then in another thread you read the
value using a relaxed load, followed by an
`atomic_thread_fence(memory_order_acquire)`, followed by a call to
`__tsan_acquire()`, TSAN may not realize that the store happened before
the fence, and so it will complain about any other variables you access
from both threads if the thread-safety of those accesses depended on the
happens-before relationship between the store and the fence.
This change eliminates the unsafe optimization in `Acquire()`. Now,
`Acquire()` acquires the mutex before checking for the existence of the
memory model state.