[Pallas/interpreter] Add a prototype for a GPU kernel interpreter.
Current limitiations:
- Only trivial grids are supported.
- An arbitrary number of threads, along a single axis, is supported.
- Primitives for memory transfers, synchronization or `wgmma` are not supported yet.
PiperOrigin-RevId: 855618516