[AMDGPU] Fix iterator invalidation during frame lowering (#163952)
I was a bit too eager to remove the SI_WHOLE_WAVE_FUNC_SETUP instruction
during prolog emission. Erasing it invalidates MBBI, which in some cases
is still needed outside of `emitCSRSpillStores`.
Do the erasing at the end of prolog insertion instead.