llm-foundry
Bias in QKV input projections for supporting Qwen2 model series
#1837
Merged

Bias in QKV input projections for supporting Qwen2 model series #1837

gupta-abhay merged 22 commits into main from jchang/qwen2
gupta-abhay
jdchang1 attention bias and ip_fc added
32e306c9
gupta-abhay Merge branch 'main' into jchang/qwen2
c62505b9
gupta-abhay changes
fdd83f04
jdchang1 commenting out no bias post process
cdf2e66f
jdchang1 fix
6e8cc6aa
jdchang1 larger shard size
3fd0141a
jdchang1 fix
9a14bfc3
jdchang1 fix
46715f98
jdchang1 remove print
5fb40dbb
gupta-abhay changes
ed766720
gupta-abhay changes"
2b91e60b
gupta-abhay changes for attention_bias interaction with bias
0e0eeeea
gupta-abhay fix mha layer
fdbf20e4
gupta-abhay fix tests
793bcab0
gupta-abhay changes to avoid linear layer disambiguation
2fbc1e0e
gupta-abhay gupta-abhay changed the title WIP: Qwen model support via MPT WIP: Qwen-2.5 model via MPT class 359 days ago
gupta-abhay Merge branch 'main' into jchang/qwen2
f0e6ccdd
gupta-abhay minor change
645bd492
gupta-abhay last attempt, else local debugging starts
ff2b2435
gupta-abhay gupta-abhay changed the title WIP: Qwen-2.5 model via MPT class Bias in QKV input projections for supporting Qwen2 model series 358 days ago
gupta-abhay gupta-abhay marked this pull request as ready for review 358 days ago
gupta-abhay gupta-abhay requested a review 358 days ago
dakinggg
dakinggg commented on 2025-06-03
gupta-abhay changes
f9ad295e
gupta-abhay saving changes
c9aead8d
gupta-abhay
dakinggg
dakinggg approved these changes on 2025-06-03
gupta-abhay changes
0f6ad357
gupta-abhay wrong conditions
42381bba
gupta-abhay gupta-abhay merged 93d04542 into main 358 days ago
gupta-abhay gupta-abhay deleted the jchang/qwen2 branch 356 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone