inference overheads optimizations (#1392)
This change makes some optimizations on various places. This change consists of a part of PR #1240 (removed the problematic part) and some other trivial fix.
1. reduce unnecessary copy when constructing vector or objects that contains vector as member. use std::move when applicable.
2. use std::vector<std::reference_wrapper<const TensorShape>> instead of std::vector<TensorShape>, when it is only for constant reference usage.
3. calculate key BEFORE (instead of AFTER) acquire lock in SessionState::GetMemoryPatternGroup
other trivial fixes (code should be straightforward and self-explainable).