sync layer norms (#272)
* sync layer norms
* all_reduce is an in_place operation
* Make dataloader use another random generator (#276)
* do all_reduce op.AVG directly
* add eval dataloader deadlock workaround
* revert generator sync
* make auto-sync configurable; basic test; cleanup
* test with updated AMI image
* fix unrelated test
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>