[CB] Changes for long generation (#45530)
* Fix KV dedup for decode batches
* Fix memory estimation
* Change default
* Added write-only fast path
* Take both peaks into account
* Revert unused config field
* Review 1
* Fix p1s
* Fix p2s and p3s that needed it
* Added a TODO
* Fix test, lower max cached graph, add TODO
* Fix fragmentation with big warmup
* Add more space for logits processors
* Fix