Use capture_stderr instead of packing sys.stderr
This method is more reliable as it can capture stderr writes from the whole process
and not only those coming from Python.
I noticed this test was failing sporadically. I'm not sure if this will fix it,
but at least it might tell us why (thanks to e.g. the weakref check).
PiperOrigin-RevId: 707557909