I'm also getting an error when I try to quantize the llama2 model now, although it was working with an older version of the code base:
[ 1/ 723] token_embd.weight - [ 8192, 32000, 1, 1], type = f16, quantizing to q4_K .. zsh: illegal hardware instruction
Have you recently upgraded to Sonoma?
Ever since I upgraded, K-quants are broken for me like this. This crash only occurs in Release (-O3) builds. Debug and -O2 work fine. Adding a print to debug this makes the issue disappear. So I have no idea how to fix it atm
My theory is something is wrong with the compiler.
If you can show me a commit where it works, I'll take a look. But atm I don't think this is llama.cpp
related problem
Thanks for this PR. I just cloned this project for the first time recently and ran into this issue. I thought I must have done something wrong when building! I pulled again and make clean; make
and it works perfectly now.
Yup, that's how we do it here - we test in production 😆
I ran into the same thing yesterday on M1 Max running macOS 13.6, and can confirm that this is fixed here too.
When I originally wrote this code, I had to ask a friend of a friend for remote access to his (Intel) Mac so I could verify that I even got the syntax correct.
My company got me an M2 Macbook, so I should be able to write better Metal code in the future.
Sorry for all the breakage 😅
My company got me an M2 Macbook
eh.. should've went with the new M3 Macbook :)
Have you recently upgraded to Sonoma?
Ever since I upgraded, K-quants are broken for me like this. This crash only occurs in Release (-O3) builds. Debug and -O2 work fine. Adding a print to debug this makes the issue disappear. So I have no idea how to fix it atm
My theory is something is wrong with the compiler.
If you can show me a commit where it works, I'll take a look. But atm I don't think this is
llama.cpp
related problem
Ah, yeah, I did just upgrade recently. If it works with a print message put in then maybe that is the solution... 😜
Login to write a write a comment.
Not sure how this even compiled for other people. On M2 Ultra there were quite a few errors in the MSL code: