Bias in QKV input projections for supporting Qwen2 model series #1837
attention bias and ip_fc added
32e306c9
Merge branch 'main' into jchang/qwen2
c62505b9
changes
fdd83f04
commenting out no bias post process
cdf2e66f
fix
6e8cc6aa
larger shard size
3fd0141a
fix
9a14bfc3
fix
46715f98
remove print
5fb40dbb
changes
ed766720
changes"
2b91e60b
changes for attention_bias interaction with bias
0e0eeeea
fix mha layer
fdbf20e4
fix tests
793bcab0
changes to avoid linear layer disambiguation
2fbc1e0e
gupta-abhay
changed the title WIP: Qwen model support via MPT WIP: Qwen-2.5 model via MPT class 359 days ago
Merge branch 'main' into jchang/qwen2
f0e6ccdd
minor change
645bd492
last attempt, else local debugging starts
ff2b2435
gupta-abhay
changed the title WIP: Qwen-2.5 model via MPT class Bias in QKV input projections for supporting Qwen2 model series 358 days ago
gupta-abhay
marked this pull request as ready for review 358 days ago
changes
f9ad295e
saving changes
c9aead8d
dakinggg
approved these changes
on 2025-06-03
changes
0f6ad357
wrong conditions
42381bba
gupta-abhay
deleted the jchang/qwen2 branch 356 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub