Place shape related compute nodes in CPU (#4940) (#5350)
* Place shape related nodes in CPU
* visit candidates by topological order
* Make CPU node placement a utility function
* skip placing on CPU if the data typs is float16 or bfloat16
Co-authored-by: Sherlock <baihan.huang@gmail.com>