Move buffer release or cache from OnRefresh to ReleaseBuffer in BucketCacheManager (#25276)
### Description
<!-- Describe your changes. -->
This PR is to move buffer release or cache from OnRefresh to
ReleaseBuffer in BucketCacheManager.
### Motivation and Context
The OnRefresh is executed after a batch(16) ep runs and inside the batch
runs, the buffer can not be really reused which is a waste for gpu
buffer resources. This PR proposed a strightforward optimization that
release or cache the buffer early in ReleaseBuffer instead of OnRefresh
to improve the buffer cache or release efficiency which will improve the
peak and average GPU memory usage. The experimental result also shows a
reasonable memory optimization without perf regressions.
#### Phi3
Optimization Strategy | Peak Memory (MB) | Avg Memory (MB) | Token Gen
Latency (ms) | Tokens/sec
-- | -- | -- | -- | --
Default Bucket | 3603.83 | 3127.05 | 7.17 | 139.50
Default Bucket with Early Release Optimization | 3534.77 (+1.92%) |
3073.97 (+1.70%) | 7.14 (+0.36%) | 140.01 (+0.36%)
#### Deepseek-R1
Optimization Strategy | Peak Memory (MB) | Avg Memory (MB) | Token Gen
Latency (ms) | Tokens/sec
-- | -- | -- | -- | --
Default Bucket | 2089.03 | 1716.15 | 6.07 | 164.67
Default Bucket with Early Release Optimization | 2034.00 (+2.63%) |
1674.49 (+2.43%) | 6.09 (-0.20%) | 164.34 (-0.20%)
#### LLama3.2-1B
Optimization Strategy | Peak Memory (MB) | Avg Memory (MB) | Token Gen
Latency (ms) | Tokens/sec
-- | -- | -- | -- | --
Default Bucket | 1736.03 | 1424.64 | 3.37 | 296.53
Default Bucket with Early Release Optimization | 1659.78 (+4.39%) |
1366.78 (+4.06%) | 3.41 (-1.09%) | 293.34 (-1.08%)