ci : reduce self-hosted server workflow jobs (#24012)
Reduce the number of parallel jobs in server-self-hosted.yml by stacking
test configurations as sequential steps within a single job, following the
pattern from #23927.
- server-metal: 4 matrix jobs -> 1 job with 4 sequential test steps
- server-cuda: 2 matrix jobs -> 1 job with 2 sequential test steps
- server-kleidiai: removed unnecessary single-entry matrix
- removed unused Setup Node.js step from server-metal
Total: 7 parallel jobs -> 3 parallel jobs
Assisted-by: llama.cpp:local pi