[qwen2-vl] fix FA2 inference (#39121)

Commit

1 year ago

[qwen2-vl] fix FA2 inference (#39121) * fix FA2 * update is causal flag and remove mask for FA2 * update for FA2 with varlen path * how the tests were passing with different devices? * add comment and ref to the PR * move mask preparation to base pretrained model * seq len is the first dim, not second * fix copies to fix GLM4V