Add Attention op for multi-head self attention in BERT (#1984)
* Add Attention op for multi head self attention in BERT
* Add test cases
* Move op from kOnnxDomain to kMSDomain.
Limit test to run by CUDA provider only.
* fix test
* Add float16 test
* fix cpu build error
* handle cuda error
* get last cuda error when failed