[TF] Remove `unbroadcast(to:)` and improve derivative performance.
In the pullback for operators that broadcast, use `Raw.broadcastGradientArgs(s0:s1:)` to compute reduction indices instead of using the inefficient `unbroadcast(to:)`.
`unbroadcast(to:)` was introduced only for defining derivatives for broadcasting operators and has no practical use, so now we remove it.
Operators affected:
- `Tensor.+(_:_:)`
- `Tensor.-(_:_:)`
- `Tensor.*(_:_:)`
- `Tensor./(_:_:)`
- `min(_:_:)`
- `max(_:_:)`
- `pow(_:_:)`