ggml-cuda : use graph allocator (#2684)

Commit

2 years ago

ggml-cuda : use graph allocator (#2684) use a different function for no_alloc to avoid breaking backwards compat, fixes lora remove 512 n_batch limit fixed 2048 batch size cleanup Co-authored-by: Johannes Gäßler <johannesg@5d6.de>