scheduler: use explicit memory fences
These were previously implied (on TSO platforms, such as x86) by the
atomic_store to sleeping and the sleep_locks acquire before the
wake_check loop, but this makes it more explicit. We might want to
consider in the future if it would be better (faster) to acquire each
possible lock on the sleeping path instead, so that we do each operation
with seq_cst, instead of using a fence to only order the operations we
care about directly.