Prompt layer-wise recompute when applicable (#20126)
### Prompt layer-wise when applicable
Give explicit prompts in export failures to users to enable layer-wise
memory optimization if we found the checkpoint function is used.
- Using checkpoint function is a strong indicator that the model is too
large to fit in GPU memory.
- If we don't override the checkpoint function here, mostly ONNX export
will be failed. 1. For old version PyTorch, when handling gradient
checkpoint feature, we just throw an exception. 2. For new version
PyTorch, an export failure happens.
- But both failures did not give users explicitly "HOW" to mitigate.
This PR did that.
``

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->