Fix a bug and add code to profile memory (#4241)
* Fix a bug and add code to profile memory
1. Compile Send/Recv again (currently broken because of
HOROVOD refactor).
2. Add code to print out initializer allocation size and
activation memory size.
* Address comments
* Split memory counts per locations
* Fix a metric