Make the replace module more configurable (#1366)
* DeepSpeedInferenceConfig
get epsilon value from config
* epsilon -> layer_norm_eps
to keep var name same as in DeepSpeedTransformerConfig
* DeepSpeedTransformerConfig
get epsilon value from config
* configurabale stochastic_mode
eg:
1. For LM pre-training True
2. For LM fine-tuning on task False
* Updated replace_module.py
checking layer_norm_eps is attribute of config
default 1e-12
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>