Sharding should be per output of IR Node, instead of per IR Node (#5330)
* sharding should be per output of IR Node, instead of per IR Node
* Update sharding_hash method
* Add test for sharding on IR with multiple output
* fix cpu test
* Fix a bug in getSharding