Refactor Attention cuda kernel (#17578)

Commit

2 years ago

Refactor Attention cuda kernel (#17578) * Break QkvToContext into small functions. Each fused and unfused kernel will have separated function. * Move DecoderAttention kernel to separated file * Move KV cache related kernel to attention_kv_cache.cu ### Motivation and Context To make the code easier to maintain.