[webgpu] Use components for VxAttentionScore (#23726)
For phi3.5-gqa-static sum_long(>1000 tokens) on meteor lake.
Before:
300 tokens in 27.0sec, e2e:11.1 tps, prompt: 212.4 tps, gen: 14.2 tps,
ttft: 5.85 sec
After:
300 tokens in 23.0sec, e2e:13.0 tps, prompt: 248.9 tps, gen: 16.6 tps,
ttft: 4.99 sec