Enabled compiled autograd for backward pass (#7667)
Compiled Autograd is an extension to torch.compile which enhances the
autograd engine by capturing a larger backward computation graph at
runtime. This allows a more comprehensive optimization of the backward
pass during training.
Overall, 5-20% speedup is expected in backward-heavy workloads with
stable graphs.
Disabled by default, the feature can be enabled from a user script by
setting `compiled_autograd_enabled=True` when invoking the engine's
`compile` method.
Note, that bfloat16 + eager backend requires PyTorch >=2.5 (where
partial fixes landed) or disabling compiled autograd for bfloat16 models
(due to a known PyTorch bug in torch.compile PyTorch #152162/#161153)
---------
Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>