[caffe2] add concat benchmark (#46457)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46457
Wanted to see if using CopyMatrix specialized for float that uses mkl_somatcopy can be faster but it wasn't. Still want to check in benchmark that can be used later.
Test Plan: .
Reviewed By: dskhudia
Differential Revision: D24345901
fbshipit-source-id: d3e68dbb560e3138fda11c55789cd41bc0715c6d