[CoreML EP] Add HardSigmoid support (#28182)
### Description
Adds `HardSigmoid` to the CoreML Execution Provider's activation op
builder. Both MLProgram (`sigmoid_hard`) and NeuralNetwork
(`ActivationSigmoidHard`) code paths are implemented; the op's ONNX
definition matches CoreML MIL's `sigmoid_hard` exactly, so no
decomposition is required.
Adds a dedicated CoreML-EP test
(`CoreMLExecutionProviderTest.HardSigmoidTest`) that builds a
single-node HardSigmoid model with non-default `alpha`/`beta` and uses
`RunAndVerifyOutputsWithEP` with `ExpectedEPNodeAssignment::All` to
confirm (a) the entire graph is claimed by the CoreML EP in both NN and
MLProgram formats, and (b) the output matches the CPU reference. I
verified the test is not trivially passing by temporarily unregistering
HardSigmoid from the activation builder — the test fails with
`VerifyEPNodeAssignment` emitting a fatal failure, proving it genuinely
exercises the CoreML path. (The existing multi-EP test in
`activation_op_test.cc` silently falls back to CPU when an EP rejects
the node, so it does not give CoreML coverage on its own.)
Also updates `coreml_supported_mlprogram_ops.md`.
### Motivation and Context
Fixes #28181.
On a DWPose pose-estimation model (`dw-ll_ucoco_384.onnx`), 4
HardSigmoid ops were each forcing a CoreML → CPU → CoreML round-trip,
and also caused downstream ops to be rejected with "unsupported inputs"
because their producers had been sent to CPU. Adding HardSigmoid
collapses the graph from 5 CoreML subgraphs to 1, and drops inference
from 9.22 ms to 6.92 ms (−25%) on Apple Silicon with MLProgram +
ComputeUnits=ALL.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>