flops_profiler: add option recompute_fwd_factor for the case of activation recompute (#3362)
When activation checkpointing is enabled, most of forward is re-computed,
and so the FLOPS calculation should be updated with recompute_fwd_factor=1.0
I don't find a way to pass the option from model script to deepspeed engine,
and so add option directly for flops_profiler.
Co-authored-by: Cheng Li <pistasable@gmail.com>