Add Pre- and Post-tunning API to allow pre- and postprocessing of params (#13411)
Some op will use a buffer for input and output at the same time, so it will do inplace update to it.
If we blindly tune over the `params`, there will be accumulated update to that buffer during FindFastest,
which is an undesired side effect. In this case, we use a proxy params struct for the tuning to avoid this side effect.