fix: Compute the full sum in llama-eval-callback, not just the sum of printed values (#15637)
This makes it much easier to compare between llama.cpp and transformers!
https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>