mul: remove opmath cast sequence (#9663)
Remove the explicit opmath-driven cast chain (bf16→f32→bf16, etc.) from
`mul`. The op now executes in the dtype chosen by standard dtype
promotion, without inserting unconditional upcast/downcast steps. But
leave its functionality for future usage.