Introduce GradientCheckpointingLayer (#37223)
* GradientCheckpointingLayer
* trigger
* Move GC layer to a separate file
* Update import
* Expose and document GC layer
* Fix dummy
* Apply to llama-based models
* Update modulars
* Update a few more models for consistency
* Update glm4
* Update Janus