Choosing input/total tokens automatically based on available VRAM? (#2673)
* Choosing input/total tokens automatically based on available VRAM?
* Update doc.
* Remove generated files.
* Trying to fix non chunking targets.
* Attempt #2
* fix.
* QuantLinear is rocm compatible.
* Much simpler logic after the overhead.
* Updating logic + non flash.
* Revert doc text.
* Simple updates.
* Fix integration mt0 (transformers update).