Fix Local Attention off by 1 bug (#25927)
### Description
Previously, local window size of GQA op excluded the current token. This
does not match standard HuggingFace implementations where tokens are
appended and then local masking occurs; the mismatch can cause the mask
to be off by 1 during generation, leading to accuracy issues. This PR
corrects this mismatch by including the current token. In practice, this
effectively decreases GQA window size by 1.
### Motivation and Context
This helps align our models with HuggingFace models.
---------
Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com>