Thank you for your contribution. I see a few issues in this PR as seen in c3753f9.
For CMake PR defaults to build with AVX-512, which is not ubiquitous. Only recent generations of Intel CPUs and very recent generations of AMD CPUs support them.
That's why I don't think it's reasonable to target AVX-512 and force a lot of folks setting WHISPER_NO_AVX512 now.
Of course situation may change in next 5 years.
Side note: I dislike that we have negative way of expressing x86 ISA extensions. Maybe we should finally change it in whisper.cpp, but this PR most likely wouldn't be the right venue to do that, as such change should be isolated from adding additional stuff.PR enables AVX things differently for CMake and make. AVX-512 consists of many instruction subsets, and not all of them have to be supported by CPU.
For MSVC CMake it uses /arch:AVX512
which per https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64.
The
__AVX512F__
,__AVX512CD__
,__AVX512BW__
,__AVX512DQ__
and__AVX512VL__
preprocessor symbols are defined when the /arch:AVX512 compiler option is specified. For more information, see Predefined macros.
For non-MSVC CMake and make it uses -mavx512f
which enables AVX-512 Foundation (F) only.
From compatibility perspective:
set is the safest to go with assuming some form of AVX-512 is supported (and that's why MSVC does what it does).
Side note: Xeon Phi is the exception (not supporting VL, DQ, BW), but per https://en.wikipedia.org/wiki/Xeon_Phi:
"In 2017, Intel announced that Knights Hill had been canceled"
"Intel announced they were discontinuing Knights Landing in summer 2018"
so there is no point in cluttering solution space with dead product, and those very few who have access to it will definitely know how to tweak CMake or make build process to achieve what they want.
That's why I think for non-MSVC CMake and make we should ideally target the same set.
For compiler it should be: -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw
Note:
For precise detection on Linux all following flags should be present avx512f avx512cd avx512vl avx512dq avx512bw
For precise detection on other systems I cannot comment right now, would need to research, but I won't do that, because
Detecting using avx512f
alone seems good enough (at least on Linux, but I suspect same flag name on other OSes). :)
Updated PR so that for CMake AVX512 must be enabled explicitly. Also added AVX512 subsets for this case as suggested, however I would argue that this will lead to "bad instruction" error during execution in case a particular subset is used by the code, but not supported by the target CPU. Although in practice this may not happen because it looks like the inference code does not use any other AVX512 instruction subsets than AVX512F.
Updated Makefile to detect AVX512 subsets.
61 | 61 | ||
62 | option(WHISPER_NO_AVX "whisper: disable AVX" OFF) | ||
63 | option(WHISPER_NO_AVX2 "whisper: disable AVX2" OFF) | ||
64 | option(WHISPER_NO_FMA "whisper: disable FMA" OFF) | ||
65 | option(WHISPER_NO_F16C "whisper: disable F16c" OFF) | ||
62 | option(WHISPER_NO_AVX "whisper: disable AVX" OFF) | ||
63 | option(WHISPER_NO_AVX2 "whisper: disable AVX2" OFF) | ||
64 | option(WHISPER_AVX512 "whisper: enable AVX512" OFF) |
I suggest using WHISPER_NO_AVX512
to be consistent with rest of these settings (until they will be overhauled one day in whisper.cpp to use positive notion instead of negative one), and naturally changing default to ON
in such case.
Sorry if my rant in side note made you misunderstand the expectation for this PR.
Are you suggesting to use WHISPER_NO_AVX512
set to ON
by default?
Yes, to be consistent, because all other ISA extension options in whisper.cpp have such polarity.
(Maybe we'll change it in the future, but if it will happen, then it should be done separately.)
144 | 144 | CXXFLAGS += -mavx2 | |
145 | 145 | endif | |
146 | 146 | ||
147 | AVX512F_M := $(shell $(CPUINFO_CMD) | grep -iw 'AVX512F') | ||
148 | ifneq (,$(AVX512F_M)) | ||
149 | CFLAGS += -mavx512f | ||
150 | CXXFLAGS += -mavx512f | ||
151 | endif | ||
152 | |||
153 | AVX512DQ_M := $(shell $(CPUINFO_CMD) | grep -iw 'AVX512DQ') | ||
154 | ifneq (,$(AVX512DQ_M)) | ||
155 | CFLAGS += -mavx512dq | ||
156 | CXXFLAGS += -mavx512dq | ||
157 | endif | ||
158 | |||
159 | AVX512CD_M := $(shell $(CPUINFO_CMD) | grep -iw 'AVX512CD') | ||
160 | ifneq (,$(AVX512CD_M)) | ||
161 | CFLAGS += -mavx512cd | ||
162 | CXXFLAGS += -mavx512cd | ||
163 | endif | ||
164 | |||
165 | AVX512CD_M := $(shell $(CPUINFO_CMD) | grep -iw 'AVX512BW') | ||
166 | ifneq (,$(AVX512BW_M)) | ||
167 | CFLAGS += -mavx512bw | ||
168 | CXXFLAGS += -mavx512bw | ||
169 | endif | ||
170 | |||
171 | AVX512VL_M := $(shell $(CPUINFO_CMD) | grep -iw 'AVX512VL') | ||
172 | ifneq (,$(AVX512VL_M)) | ||
173 | CFLAGS += -mavx512vl | ||
174 | CXXFLAGS += -mavx512vl | ||
175 | endif |
I suggest combining them into one block, where we check just for AVX512F, but then add flags related to all 5 AVX-512 subsets (F, CD, VL, DQ, BW). I don't believe Xeon Phi existence is good enough reason check all of these subsets individually. I never suggested checking them individually.
Table at https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512 explains why.
Let's have a simpler code. If there will be mainstream CPU in the future not fitting into the assumption (F presence means presence of CD, VL, DQ and BW too), then we can start playing again with checking those flags individually.
174 | CXXFLAGS += -mavx512vl | ||
175 | endif | ||
176 | |||
177 | AVX512VNNI_M := $(shell $(CPUINFO_CMD) | grep -iwE 'AVX512_VNNI|AVX512VNNI') |
AVX512_VNNI is for Linux, so just out of curiosity:
Which OS uses AVX512VNNI?
As you're adding VNNI beside basic AVX-512 subsets, I would suggest adding also VBMI, as llama.cpp does it too. It's nice to have some kind of parity between these projects.
Moreover, if you're adding VNNI and VBMI in Makefile, you should add them in CMakeLists.txt too, via WHISPER_NO_AVX512_VNNI
and WHISPER_NO_AVX512_VBMI
, defaulting to ON
. I suggest using add_compile_definitions
similarly like llama.cpp does.
Added AVX512VNNI for pure symmetry just in case any system does have it.
I asked about AVX512VNNI, didn't request removing. But I see you removed it in dc69460.
So I looked around and it seems AVX512VNNI is the name reported by FreeBSD.
http://fxr.watson.org/fxr/source/x86/x86/identcpu.c#L994
Please bring back grep -iwE 'AVX512_VNNI|AVX512VNNI'
for VNNI detection.
Great, my feeling was right then.
471 | set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX512") | ||
472 | set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /arch:AVX512") | ||
473 | set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} /arch:AVX512") | ||
474 | # from llama.cpp: |
I woud suggest removing from llama.cpp
comment line. There are many things shared between llama.cpp and whisper.cpp, in both directions, so if we would be marking all of them, it would be pure noise. :)
Sure, just wanted to note, that I'm not author of that comment.
So fixing VNNI detection for non-Linux and removing superfluous comment are the only remaining things.
Rest looks good to me. (But I haven't tested locally building and running whisper.cpp w/ AVX512 enabled.)
Tested inference on a Linux machine with AVX512 all five instruction subsets and AVX512 VNNI.
One little thing. I would suggest changing PR title for two reasons:
I would suggest following one:
build : support enabling use of AVX-512 (detected in make, via WHISPER_NO_AVX512=0 for CMake)
Login to write a write a comment.
Adds missing AVX512 detection to Makefile and CMakeLists.txt