LopezCastroRoberto
changed the title [Perf] Replace torch.cat with vectorized CUDA kernel for MLA query concat [Perf][WIP] Replace torch.cat with vectorized CUDA kernel for MLA query concat80 days ago
LopezCastroRoberto
changed the title [Perf][WIP] Replace torch.cat with vectorized CUDA kernel for MLA query concat [Perf][WIP] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.275 days ago
LopezCastroRobertomarked this pull request as ready for review 75 days ago
LopezCastroRoberto
changed the title [Perf][WIP] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 [Perf] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.274 days ago
LopezCastroRoberto
changed the title [Perf] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 [Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.274 days ago
Login to write a write a comment.
Login via GitHub