Using a map of of ops to stages as input of partition function. (#5940)
* New partition algorithm running before AD
* Convert cut_group_info into device map. Work in progress -- works for bert-tiny with pp=2
* Removing code for partition of bwd graphs
* Remove old code
* Adding some verification code
* Handle Shared Initializer
* Renaming rank with stage
* Added first unit test
* new test
* redundant check
* undo change in bert
* Moved cut-based partition to testing utils file
Co-authored-by: xzhu1900
Co-authored-by: wschin
* New conversion function and tests
* minor
* remove test that is not needed2
* improve GetDeviceAssignment and PR comments
* minor changes
* PR comments
* improving documentation and variable naming
* add documentation
* Variable naming and docs
* more doc improvements
* more doc improvements
* missing static cast
* Fix test file for windows
* Fix test file for windows
* Fix test file for windows
* stage id is not the same as rank id
* PR comments
* PR comments
* More comments
* More comments