[TF 2.2 compat] use tf.VariableAggregation.ONLY_FIRST_REPLICA (#4283)
* Fix the issue to properly run the accumulator with TF 2.2
* Apply style
* Fix training_args_tf for TF 2.2
* Fix the TF training args when only one GPU is available
* Remove the fixed version of TF in setup.py