Localized Recompute for Gelu and AttentionDropout (#4402)
* Gelu Activation Recompute Draft
* Prototype for localized recompute
* Introduce localized_recompute rewriter
* Command line args for enabling recompute
* Add logger to Gradient Graph Builder
* use const when possible