[GPU] Fix fp16 intermediate overflow in fc_bf_tiled DQ scale path #35228
jade-cho
approved these changes
on 2026-04-14
[GPU] Fix fp16 overflow issue for FC bf tiled kernel
f6e5a950
fix unit test
17e2d34f
optimize fc bf titled kernel
bd208037
fix(GPU): reorder dq scale multiplication to match description
ff939f98
fix(GPU): reorder dq scale multiplication without float tmp
8bb137e8
fix(GPU): use float tmp for non-INT8 dq scale in fc_bf_tiled
b62fd64c
remove redundant unit test
6b84f81b
ahnyoung-paul
changed the title Fix fp16 overflow fc bf tiled [GPU] Fix fp16 intermediate overflow in fc_bf_tiled DQ scale path 48 days ago
isanghao
approved these changes
on 2026-04-16
simplify non-INT8 DQ scale computation to single expression
3da775e6
isanghao
merged
0f352d09
into master 47 days ago
isanghao
deleted the fix_fp16_overflow_fc_bf_tiled branch 47 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub