[Mosaic GPU] Use a custom pass to compile PTX to CUBIN.
The pass lowers existing PTX into a `gpu.binary` op using stream executor
compilation providers. The stock MLIR pipeline uses `ptxas` in a subprocess
to compile PTX by default. This does not work reliably in all environments,
and stream executor's compilation providers are meant to remedy this problem.
We also unify the dumping pipeline with the main compilation pipeline, thereby
avoiding the need to compile twice in order to gather dumps!
PiperOrigin-RevId: 796480591