Refactor ExecuteReplicated to operate on sharded data directly (#5737)
* Refactor ExecuteReplicated to operate on sharded data directly
* Remove old handlers
* formatting
* Improve naming and logging
* update docstring
* Remove obsolete unit tests
* improve comment
* Remove slow calls to get output shapes.
* fix implicit sharding
* remove declarations of input/output handlers
* formatting
* give everything a manual placeholder sharding
* see if CI passes
* formatting
* Shard parameter and output handling
* Use absl::BlockingCounter
* formatting
* fix merge
* Assign valid output shardings
* tune and document costs
* formatting
* implicitly replicate output to match outputhandler
* clarify ReplicateShardedData
* fix merge