Add microbench to benchmark single operators. (#10678)
* Add microbench to benchmark single operators.
* Move to tool directory; seperate data genration from io binding.
* Refector.
* Clean up.
* Use precision instead for extensibility.
* Refactor the create_io_binding function to take in torch tensors
instead of numpy arrays; this reflects more accurately what
the function does, because it is torch tensors that got bound.