Automatic Model Parallelism Through FX (#1933)
* WIP
* add dist ops
* add index propagation
* support tp for linears
* add embedding & weight tie
* address comments
* lint
* fix
* fix
* debug
* fix
* fix tests
* add experimental API
* nit
* fix api
* fix api
* format
* clean tests
* fix weight_map
* add weights loading
* format
* fix
* fix
* enable tests
* address comments