Decouple compilation and initialization in Mosaic GPU custom call.
This change refactors the Mosaic GPU custom call handler to decouple the compilation of the MLIR module from its initialization within a specific CUDA context. The compilation result is now cached globally based on the kernel hash. The initialization, which is context-dependent, is cached separately for each compiled kernel and CUDA context.
This is the first step toward moving compilation out of the first execution.
PiperOrigin-RevId: 859130600