Build a separate ARM wheel for macOS (#11149)
## Summary
Since we already build an x86 wheel, we can just build an ARM wheel
rather than cross-compiling to universal.
The build time is ~3 minutes vs. > 20 minutes and the resulting artifact
is much smaller, which is also a win for users.