Make attention_is_all_.. model efficient for lazy tensor
- avoid calling .item() which copies a tensor to CPU forcing graph sync
- .item() was being called by cal_performance() in order to produce perf
stats that weren't being used anyway; instead just call cal_loss()