Remove `sequentially_access_by_threads` for Conv (#24938)
### Description
<!-- Describe your changes. -->
In Transforms.js, the `sequentially_access_by_threads` flag should be
set to `true` **only** when the GPU vendor is Intel, as experiments have
shown that Intel GPUs perform better with this setting enabled.
Currently, ORT sets `sequentially_access_by_threads` to `true`
regardless of the GPU vendor.
However, based on my local testing, setting
`sequentially_access_by_threads` to `false` consistently results in
better performance across all platforms.
In ONNX Runtime (ORT), this flag is only applied to Conv operators that
are not using `vec4` packing (i.e., `MakeMatMulPackedSource`). For
GEMM/MatMul operators without `vec4`, the flag remains `false`.
Therefore, this change will only affect Conv test cases without `vec4`.
This PR leads to performance improvements in certain convolution cases.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
I test with local conv model(x[1,256,224,224] weight[63, 256, 3, 3],
which don't use vec4), the result is
| (ms) | M3Max | NVIDA P620 | NVIDA 5080 | intel |
|----------------|-------|------------|------------|-------|
| sequentially_access_by_threads == true | 11.2 | 112 | 2.88 | 85.9 |
| sequentially_access_by_threads == false | **7** | **66** | **1.90** |
**53.4** |