LopezCastroRoberto
changed the title [Perf] Replace torch.cat with vectorized CUDA kernel for MLA query concat [Perf][WIP] Replace torch.cat with vectorized CUDA kernel for MLA query concat129 days ago
LopezCastroRoberto
changed the title [Perf][WIP] Replace torch.cat with vectorized CUDA kernel for MLA query concat [Perf][WIP] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2124 days ago
LopezCastroRobertomarked this pull request as ready for review 124 days ago
LopezCastroRoberto
changed the title [Perf][WIP] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 [Perf] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2124 days ago
LopezCastroRoberto
changed the title [Perf] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 [Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2124 days ago
Login to write a write a comment.
Login via GitHub