falcon : fix CUDA inference by making K and Q contiguous #2830
falcon : fix CUDA inference by making K and Q contiguous
7c55447f
cuda : add assert to guard from non-cont ropes
cc924c57
ggerganov
merged
eaa13a48
into master 2 years ago
ggerganov
deleted the fix-falcon branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub