[Runtime] Improve performance and memory footprint of compatibility overrides
rdar://143401725
Replacing the (non-inlined) call to `swift_once` with a relaxed atomic significantly improves the generated code and reduces the memory footprint. The mechanism itself now does not cause a stack frame to be generated and the expected case (no override) should be perfectly predicted and executed in straight line code. The override case should also be well predicted, with only two branches on the same value.