onnxruntime
[JS/WebGPU] Improve MatMulNBits perf
#19974
Merged

[JS/WebGPU] Improve MatMulNBits perf #19974

satyajandhyala
satyajandhyala satyajandhyala added ep:WebGPU
satyajandhyala satyajandhyala force pushed from 226b55e9 to c18c465c 2 years ago
satyajandhyala satyajandhyala force pushed from c18c465c to 3a1d39c1 2 years ago
satyajandhyala satyajandhyala marked this pull request as ready for review 2 years ago
satyajandhyala satyajandhyala changed the title [WIP][JS/WebGPU] Improve MatMulNBits perf [JS/WebGPU] Improve MatMulNBits perf 2 years ago
satyajandhyala satyajandhyala force pushed from b917cd73 to f2177d39 2 years ago
satyajandhyala satyajandhyala force pushed from f2177d39 to efa1b545 2 years ago
satyajandhyala
azure-pipelines
satyajandhyala satyajandhyala requested a review from guschmue guschmue 1 year ago
satyajandhyala Improve perf
9ee1edfd
satyajandhyala Fix lint error.
fdbe3e34
satyajandhyala Format
f2cf7345
satyajandhyala Changes to make any combinations of components to work.
cb7256c6
satyajandhyala Perform blockwise matmul
8e196cf0
satyajandhyala format
8a87b0ca
satyajandhyala Fixed some errors.
d4896808
satyajandhyala Added workgroupSize and dispatchGroup.
09c9acac
satyajandhyala Use bit operations instead of multiplications and divisions
57583401
satyajandhyala Added maxComputeWorkgroupSizes function to get retrieve workgroup siz…
68ae511b
satyajandhyala Added batch dim
688cc795
satyajandhyala Added batch support
8ac464cc
satyajandhyala Removed separate reduce step.
20863096
satyajandhyala minor fix
cfd49ccb
satyajandhyala WIP: adding components.
42f3ebbc
satyajandhyala Format
9fe360ea
satyajandhyala Added outputNumber back.
06285942
satyajandhyala Only the leading shader in the workgroup needs to write outut.
9cbd993c
satyajandhyala Prefetch necessary input tensor data
1c5b7a18
satyajandhyala Unroll innermost loops to reduce loop overhead
a4ade113
satyajandhyala Removed functional call overhead.
76926d02
satyajandhyala Added getMaxWorkgroupStorageSize
1cc10115
satyajandhyala Compute workgroupSizeX as multiple of nBlocksPerCol
14243243
satyajandhyala Removed unused uniforms.
7842a41a
satyajandhyala Removed outputNumber
ec871fc8
satyajandhyala Removed block_size variable
18cac07d
satyajandhyala Choose components based on memory availability and produced fatal error
681b9938
satyajandhyala Reroll the last loop nest
5b8bbb4b
satyajandhyala Added fallback option to blockwise matmulnbits
2a702ef6
satyajandhyala Removed unused variable.
1dc620ae
satyajandhyala typo
19cd478a
satyajandhyala Temporary commmit
1f990080
satyajandhyala Code optimization and clean up.
7ac51249
satyajandhyala Modified getMaxComponents to accept arbitrary number of arguments.
81868ef4
satyajandhyala Added rectangular output testcases.
ba93c4bf
satyajandhyala satyajandhyala force pushed from 38cb3ce6 to ba93c4bf 1 year ago
satyajandhyala Prefer using BlockwiseMatMulNBits.
8a7bc25b
satyajandhyala Removed workgroup shared memory initialization to 0.
6bf16612
satyajandhyala Performace tuning
6fd81d6c
satyajandhyala Removed pre-fetching input data.
4da4a8e2
satyajandhyala Re-roll the for loops.
f4de76ab
satyajandhyala Prefer additions over multiplications.
df3688ea
satyajandhyala Fixed hint for the fallback
8c78dae5
satyajandhyala Use unpack4xU8
e3c858e3
satyajandhyala Load 8 element of input at a time
1dd7a882
satyajandhyala Fixed zero_point offset calculation.
4e9fd96a
satyajandhyala Use near multiple of 4 when calculating components.
bd9fc91f
satyajandhyala Deal with odd numbers.
811ce128
satyajandhyala Renamed variable row and col instead of m and n
42e43223
satyajandhyala Added processOneBlock to refactor code.
95ded112
satyajandhyala Added bBlocksPerCol and blobSize to attributes to avoid recalculating.
ce73fc37
satyajandhyala Added missing semicolon
63d13244
satyajandhyala Simplified component calculation
5d37de2d
satyajandhyala Cleaned-up uniforms
515091fc
satyajandhyala Removed backup file added by mistake
087afc92
satyajandhyala minor change
56a14290
satyajandhyala Revert "Added bBlocksPerCol and blobSize to attributes to avoid recal…
546c26ed
satyajandhyala Reverted changes to getMaxComponents.
a59d736d
satyajandhyala Format
38e501ee
satyajandhyala
azure-pipelines
guschmue
guschmue approved these changes on 2024-04-12
satyajandhyala satyajandhyala merged b33216be into main 1 year ago
satyajandhyala satyajandhyala deleted the sajandhy/webgpu_matmulnbits_perf branch 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone