[Pallas/Mosaic GPU] Fix bug in computation of profiler buffer allocation size.
We accounted for the byte width twice, but forgot to account for the fact that
we stored two elements per event, and were therefore off by a factor of 2.
PiperOrigin-RevId: 899595571