benchmark
0a2ff223 - Fix colfax_cutlass flash_attention operator (#2401)

Commit
1 year ago
Fix colfax_cutlass flash_attention operator (#2401) Summary: colfax_cutlass kernels will fail because of C++ template instantiation. We need to explicitly include the header file to instantiate all template parameters. Pull Request resolved: https://github.com/pytorch/benchmark/pull/2401 Test Plan: Install the colfax_cutlass operators: ``` python install.py --userbenchmark triton --cutlass /home/xz/git/benchmark/submodules/cutlass-kernels/src/fmha/fmha_forward.cu(826): warning https://github.com/pytorch/benchmark/issues/117-D: non-void function "main" should return a value return; ^ Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" /home/xz/git/benchmark/submodules/cutlass-kernels/src/fmha/fmha_forward.cu(826): warning https://github.com/pytorch/benchmark/issues/117-D: non-void function "main" should return a value return; ^ Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" ``` Run the flash_attention operator from colfax_cutlass ``` python run_benchmark.py triton --op flash_attention --only colfax_cutlass --num-inputs 1 (Batch, Heads, SeqLen, Dhead) colfax_cutlass-latency ------------------------------- ------------------------ (32, 32, 512, 64) 0.001024 ``` Reviewed By: manman-ren Differential Revision: D60557212 Pulled By: xuzhao9 fbshipit-source-id: 25b216f850d2e82815041059d372627806bfd3ca
Author
Parents
Loading