Fuse attention node even in case of different Q,K hidden dimensions (#8106)
* changes to fuse attention node and create varied dimensions
* added an option to optimizer to only do offline fusion
* fixing a typo
* merge with master
* removing extra changes
* added new unit test - test_attention_fusion_for_varied_qkv_dimensions()
* Unit test succesfull for q,k,v paths with varied dimensions
* adding test model for unit test case
* optimizing attention tests
* removing debugs
* minor change
* addressing comments
* addressing comments
* changed the new option to disable_onnxruntime
* replacing asserts with debugs
* make attn fusion backward compatible for head_size, hidden_size
* preserving behavior for shape_modified_tensor
* adding new option as the last parameter
* cleaning up
* line breaks and spaces
* formatting according to python
* making the changes to fuse attention node without user input
* changes to fusion_attention.py updated
* bringing the code up to python standard