[webgpu] revise implementation of buffer split support (#26429)
### Description
This PR addresses a few concerns:
- revert `const ProgramBase&` -> `ProgramBase&`: this itself is not
doing something wrong but gives much more pressure for who reads the
code to understand whether/where the program object is modified. It also
can introduce further unexpected modifications to the program object
(for example the indirect dispatch code)
- change bool option
`"ep.webgpuexecutionprovider.smallStorageBufferBindingSizeForTesting"`
to `"ep.webgpuexecutionprovider.maxStorageBufferBindingSize"` so now
it's possible to set any value in option. (setting to <128MB will cause
an assert failure)
- segments are optional in cache key (only present when not equals to 1,
which is the common case)
- avoid some unnecessary API calls, which is OK for native but may
affect web perf.
- clean up the code a little bit and add a few comments
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>