llama.cpp
llama : reorganize source code + improve CMake
#8006
Merged

llama : reorganize source code + improve CMake #8006

ggerganov merged 13 commits into master from gg/reorganize-project
ggerganov
ggerganov329 days ago (edited 322 days ago)👍 6

ref #7573, #6913

Adopt a new structure of the source code that makes ggml source code and build scripts easier to reuse.

Note that the build options that are relevant to the ggml library are now prefixed with GGML_. For example, LLAMA_CUDA and WHISPER_METAL are now GGML_CUDA and GGML_METAL. However, WHISPER_COREML and WHISPER_OPENVINO are still named the same because the CoreML and OpenVINO functionality in whisper.cpp is not part of the ggml library.

The Makefiles in llama.cpp and whisper.cpp have been updated to be more similar with each other.

Header files (such as ggml.h, llama.h and whisper.h) are now placed in include subfolders, while the source files are placed in src.

PRs in other projects that will be updated and merged together with this one:

TODOs:

  • update ggml repo
  • update whisper.cpp repo
  • update Makefiles
    • llama.cpp
    • whisper.cpp
  • update sync scripts
    • llama.cpp
    • ggml
    • whisper.cpp
    • add ggml/cmake
  • fix examples
  • fix CI
    • HIPBLAS
    • kompute
    • nix (maybe will work after merge to master?)
    • windows
  • update README with new GGML_ options
    • llama.cpp
    • whisper.cpp

Changes:

  • deprecate LLAMA_XXX in favor of GGML_XXX

Resolved PRs:

TODO in follow-up PRs:

  • move relevant tests from tests to ggml/tests
  • avoid using CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS and build and link shared libraries properly
  • link shared libs to examples by default
  • simplify sync scripts to work on the ggml folder level

github-actions github-actions added build
github-actions github-actions added python
ggerganov ggerganov force pushed from 729c7ccb to cde2512a 329 days ago
mofosyne mofosyne added Review Complexity : Medium
ggerganov ggerganov force pushed from 62d26cce to 3ab24ad6 329 days ago
ggerganov ggerganov force pushed from 3ab24ad6 to b1064a77 328 days ago
ggerganov ggerganov force pushed from 13a97767 to ad6d310d 328 days ago
ggerganov ggerganov force pushed from ad6d310d to c79bb52d 328 days ago
ggerganov ggerganov force pushed from c79bb52d to b6a1addd 328 days ago
ggerganov ggerganov force pushed from 3f4a147e to faaaeffe 328 days ago
ggerganov ggerganov force pushed from faaaeffe to b52fde17 328 days ago
ggerganov ggerganov force pushed from c18326af to 5c5e4d40 328 days ago
ggerganov ggerganov force pushed from 5c5e4d40 to ace2b97c 328 days ago
ggerganov ggerganov force pushed from ace2b97c to 8216c4c3 328 days ago
github-actions github-actions added script
ggerganov ggerganov force pushed from 8216c4c3 to 5b1490a6 328 days ago
github-actions github-actions added devops
ggerganov ggerganov force pushed from c9a3dc85 to 73d5f903 327 days ago
github-actions github-actions added documentation
github-actions github-actions added nix
github-actions github-actions added examples
github-actions github-actions added SYCL
ggerganov
ggerganov327 days ago❤ 1

The llama.cpp reorganization should be now mostly ready and can be tested. The ggml code, tests and cmake scripts will be shared across all 3 repos (as if it is a git submodule).

Some CI workflows are still failing - any help with resolving these is appreciated. I'll now focus on updating whisper.cpp to reuse ggml in a similar way and then look to merge this sometime next week. Extra attention to the new build options (e.g. LLAMA_CUDA is now GGML_CUDA)

ggerganov ggerganov force pushed from e550e3e0 to 7c1c378c 327 days ago
slaren
slaren327 days ago (edited 327 days ago)

The windows build is failing because some targets do not link to the libraries they need. On linux that works because in the end all the symbols are put together indiscriminately, but not on windows.

This solves some of the problems, but some tests depend on internal functions such as unicode_cpt_to_utf8 of llama.cpp that are not exported. But I am not sure if this a real solution, because with static builds I think this is going to cause multiple versions of ggml and llama to be linked in some binaries.

diff --git a/common/CMakeLists.txt b/common/CMakeLists.txt
index c6fccc02..299d53c3 100644
--- a/common/CMakeLists.txt
+++ b/common/CMakeLists.txt
@@ -85,4 +85,4 @@ endif ()

 target_include_directories(${TARGET} PUBLIC .)
 target_compile_features   (${TARGET} PUBLIC cxx_std_11)
-target_link_libraries     (${TARGET} PRIVATE ${LLAMA_COMMON_EXTRA_LIBS} PUBLIC llama Threads::Threads)
+target_link_libraries     (${TARGET} PRIVATE ${LLAMA_COMMON_EXTRA_LIBS} PUBLIC ggml llama Threads::Threads)
diff --git a/examples/benchmark/CMakeLists.txt b/examples/benchmark/CMakeLists.txt
index 34a58cc0..358fd130 100644
--- a/examples/benchmark/CMakeLists.txt
+++ b/examples/benchmark/CMakeLists.txt
@@ -1,6 +1,6 @@
 set(TARGET llama-bench-matmult)
 add_executable(${TARGET} benchmark-matmult.cpp)
 install(TARGETS ${TARGET} RUNTIME)
-target_link_libraries(${TARGET} PRIVATE llama build_info ${CMAKE_THREAD_LIBS_INIT})
+target_link_libraries(${TARGET} PRIVATE ggml llama build_info ${CMAKE_THREAD_LIBS_INIT})
 target_include_directories(${TARGET} PRIVATE ../../common)
 target_compile_features(${TARGET} PRIVATE cxx_std_11)
diff --git a/examples/gbnf-validator/CMakeLists.txt b/examples/gbnf-validator/CMakeLists.txt
index 4edd6ec7..29530835 100644
--- a/examples/gbnf-validator/CMakeLists.txt
+++ b/examples/gbnf-validator/CMakeLists.txt
@@ -1,5 +1,5 @@
 set(TARGET llama-gbnf-validator)
 add_executable(${TARGET} gbnf-validator.cpp)
 install(TARGETS ${TARGET} RUNTIME)
-target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
+target_link_libraries(${TARGET} PRIVATE common ggml llama ${CMAKE_THREAD_LIBS_INIT})
 target_compile_features(${TARGET} PRIVATE cxx_std_11)
diff --git a/examples/llama-bench/CMakeLists.txt b/examples/llama-bench/CMakeLists.txt
index 5bdbea4e..eb1918c2 100644
--- a/examples/llama-bench/CMakeLists.txt
+++ b/examples/llama-bench/CMakeLists.txt
@@ -1,5 +1,5 @@
 set(TARGET llama-bench)
 add_executable(${TARGET} llama-bench.cpp)
 install(TARGETS ${TARGET} RUNTIME)
-target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
+target_link_libraries(${TARGET} PRIVATE ggml common llama ${CMAKE_THREAD_LIBS_INIT})
 target_compile_features(${TARGET} PRIVATE cxx_std_11)
diff --git a/examples/quantize-stats/CMakeLists.txt b/examples/quantize-stats/CMakeLists.txt
index bb986a71..73e7f943 100644
--- a/examples/quantize-stats/CMakeLists.txt
+++ b/examples/quantize-stats/CMakeLists.txt
@@ -1,6 +1,6 @@
 set(TARGET llama-quantize-stats)
 add_executable(${TARGET} quantize-stats.cpp)
 install(TARGETS ${TARGET} RUNTIME)
-target_link_libraries(${TARGET} PRIVATE llama build_info ${CMAKE_THREAD_LIBS_INIT})
+target_link_libraries(${TARGET} PRIVATE ggml llama build_info ${CMAKE_THREAD_LIBS_INIT})
 target_include_directories(${TARGET} PRIVATE ../../common)
 target_compile_features(${TARGET} PRIVATE cxx_std_11)
diff --git a/include/llama.h b/include/llama.h
index 05d8b092..d833ba4a 100644
--- a/include/llama.h
+++ b/include/llama.h
@@ -1124,23 +1124,23 @@ struct llama_grammar_candidate {
     llama_partial_utf8   partial_utf8;
 };

-const std::vector<std::pair<std::string, struct ggml_tensor *>> & llama_internal_get_tensor_map(
+LLAMA_API const std::vector<std::pair<std::string, struct ggml_tensor *>> & llama_internal_get_tensor_map(
     struct llama_context * ctx
 );

-void llama_grammar_accept(
+LLAMA_API void llama_grammar_accept(
         const std::vector<std::vector<llama_grammar_element>>         & rules,
         const std::vector<std::vector<const llama_grammar_element *>> & stacks,
         const uint32_t                                                  chr,
         std::vector<std::vector<const llama_grammar_element *>>       & new_stacks);

-std::pair<std::vector<uint32_t>, llama_partial_utf8> decode_utf8(
+LLAMA_API std::pair<std::vector<uint32_t>, llama_partial_utf8> decode_utf8(
         const std::string & src,
         llama_partial_utf8   partial_start);

 // Randomly selects a token from the candidates based on their probabilities using given std::mt19937.
 // This is a temporary workaround in order to fix race conditions when sampling with multiple sequences.
-llama_token llama_sample_token_with_rng(struct llama_context * ctx, llama_token_data_array * candidates, std::mt19937 & rng);
+LLAMA_API llama_token llama_sample_token_with_rng(struct llama_context * ctx, llama_token_data_array * candidates, std::mt19937 & rng);

 #endif // LLAMA_API_INTERNAL

diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt
index cfa70731..f70cdb2b 100644
--- a/tests/CMakeLists.txt
+++ b/tests/CMakeLists.txt
@@ -55,7 +55,7 @@ function(llama_target_and_test source)

     add_executable(${TEST_TARGET} ${source} get-model.cpp)
     install(TARGETS ${TEST_TARGET} RUNTIME)
-    target_link_libraries(${TEST_TARGET} PRIVATE common)
+    target_link_libraries(${TEST_TARGET} PRIVATE ggml llama common)
     add_test(
         NAME ${TEST_TARGET}
         WORKING_DIRECTORY ${LLAMA_TEST_WORKING_DIRECTORY}
slaren
slaren327 days ago

It works with -DBUILD_SHARED_LIBS=OFF, so I am probably misunderstanding the error.

slaren
slaren327 days ago (edited 327 days ago)

I noticed now that CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS is set in ggml when building on windows with shared libs. I think that the problem actually is that CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS is not set early enough, because configuring with -DCMAKE_WINDOWS_EXPORT_ALL_SYMBOLS=ON also works, without requiring the changes that I listed before.

This seems to work. My assumption is that CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS needs to be set per project, unless passed through the command line, then it is applied to every subproject as well, but I am not sure if that's correct.

However, all of this seems like a hack, we go through the effort to use dllexport in ggml.h and llama.h with all the symbols that should be exported, but then we just throw all of this away and use CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS to export every symbol.

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 6b9b5413..96718d75 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -6,6 +6,7 @@ include(CheckIncludeFileCXX)
 set(CMAKE_WARN_UNUSED_CLI YES)

 set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
+set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)

 if (NOT XCODE AND NOT MSVC AND NOT CMAKE_BUILD_TYPE)
     set(CMAKE_BUILD_TYPE Release CACHE STRING "Build type" FORCE)
diff --git a/common/CMakeLists.txt b/common/CMakeLists.txt
index c6fccc02..02415f2d 100644
--- a/common/CMakeLists.txt
+++ b/common/CMakeLists.txt
@@ -1,5 +1,7 @@
 # common

+set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
+
 find_package(Threads REQUIRED)

 # Build info header
diff --git a/ggml/CMakeLists.txt b/ggml/CMakeLists.txt
index bdbda425..21ef7e4a 100644
--- a/ggml/CMakeLists.txt
+++ b/ggml/CMakeLists.txt
@@ -3,6 +3,7 @@ project("ggml" C CXX)
 include(CheckIncludeFileCXX)

 set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
+set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)

 if (NOT XCODE AND NOT MSVC AND NOT CMAKE_BUILD_TYPE)
     set(CMAKE_BUILD_TYPE Release CACHE STRING "Build type" FORCE)
diff --git a/ggml/src/CMakeLists.txt b/ggml/src/CMakeLists.txt
index 84bc8e19..b7c79321 100644
--- a/ggml/src/CMakeLists.txt
+++ b/ggml/src/CMakeLists.txt
@@ -825,10 +825,6 @@ endif()

 if (WIN32)
     add_compile_definitions(_CRT_SECURE_NO_WARNINGS)
-
-    if (BUILD_SHARED_LIBS)
-        set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
-    endif()
 endif()

 if (GGML_LTO)
slaren
slaren commented on 2024-06-21
CMakeLists.txt
15269
153#
154# Compile flags
155#
156
157if (LLAMA_SYCL)
158 set(CMAKE_CXX_STANDARD 17)
159else()
160 set(CMAKE_CXX_STANDARD 11)
161endif()
162
163set(CMAKE_CXX_STANDARD_REQUIRED true)
164set(CMAKE_C_STANDARD 11)
165set(CMAKE_C_STANDARD_REQUIRED true)
166set(THREADS_PREFER_PTHREAD_FLAG ON)
167
168find_package(Threads REQUIRED)
169include(CheckCXXCompilerFlag)
170
171add_compile_definitions(GGML_SCHED_MAX_COPIES=${LLAMA_SCHED_MAX_COPIES})
172
173# enable libstdc++ assertions for debug builds
174if (CMAKE_SYSTEM_NAME MATCHES "Linux")
175 add_compile_definitions($<$<CONFIG:Debug>:_GLIBCXX_ASSERTIONS>)
176endif()
177
178if (NOT MSVC)
179 if (LLAMA_SANITIZE_THREAD)
180 add_compile_options(-fsanitize=thread)
181 link_libraries (-fsanitize=thread)
182 endif()
183
184 if (LLAMA_SANITIZE_ADDRESS)
185 add_compile_options(-fsanitize=address -fno-omit-frame-pointer)
186 link_libraries (-fsanitize=address)
187 endif()
188
189 if (LLAMA_SANITIZE_UNDEFINED)
190 add_compile_options(-fsanitize=undefined)
191 link_libraries (-fsanitize=undefined)
192 endif()
193endif()
194
195if (APPLE AND LLAMA_ACCELERATE)
196 find_library(ACCELERATE_FRAMEWORK Accelerate)
197 if (ACCELERATE_FRAMEWORK)
198 message(STATUS "Accelerate framework found")
199
200 add_compile_definitions(GGML_USE_ACCELERATE)
201 add_compile_definitions(ACCELERATE_NEW_LAPACK)
202 add_compile_definitions(ACCELERATE_LAPACK_ILP64)
203 set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} ${ACCELERATE_FRAMEWORK})
204 else()
205 message(WARNING "Accelerate framework not found")
206 endif()
207endif()
208
209if (LLAMA_METAL)
210 find_library(FOUNDATION_LIBRARY Foundation REQUIRED)
211 find_library(METAL_FRAMEWORK Metal REQUIRED)
212 find_library(METALKIT_FRAMEWORK MetalKit REQUIRED)
213
214 message(STATUS "Metal framework found")
215 set(GGML_HEADERS_METAL ggml-metal.h)
216 set(GGML_SOURCES_METAL ggml-metal.m)
217
218 add_compile_definitions(GGML_USE_METAL)
219 if (LLAMA_METAL_NDEBUG)
220 add_compile_definitions(GGML_METAL_NDEBUG)
221 endif()
222
223 # copy ggml-common.h and ggml-metal.metal to bin directory
224 configure_file(ggml-common.h ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/ggml-common.h COPYONLY)
225 configure_file(ggml-metal.metal ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/ggml-metal.metal COPYONLY)
226
227 if (LLAMA_METAL_EMBED_LIBRARY)
228 enable_language(ASM)
229 add_compile_definitions(GGML_METAL_EMBED_LIBRARY)
230
231 set(METALLIB_COMMON "${CMAKE_CURRENT_SOURCE_DIR}/ggml-common.h")
232 set(METALLIB_SOURCE "${CMAKE_CURRENT_SOURCE_DIR}/ggml-metal.metal")
233
234 file(MAKE_DIRECTORY "${CMAKE_BINARY_DIR}/autogenerated")
235
236 # merge ggml-common.h and ggml-metal.metal into a single file
237 set(METALLIB_EMBED_ASM "${CMAKE_BINARY_DIR}/autogenerated/ggml-metal-embed.s")
238 set(METALLIB_SOURCE_EMBED "${CMAKE_BINARY_DIR}/autogenerated/ggml-metal-embed.metal")
239
240 add_custom_command(
241 OUTPUT ${METALLIB_EMBED_ASM}
242 COMMAND echo "Embedding Metal library"
243 COMMAND sed -e '/\#include \"ggml-common.h\"/r ${METALLIB_COMMON}' -e '/\#include \"ggml-common.h\"/d' < ${METALLIB_SOURCE} > ${METALLIB_SOURCE_EMBED}
244 COMMAND echo ".section __DATA,__ggml_metallib" > ${METALLIB_EMBED_ASM}
245 COMMAND echo ".globl _ggml_metallib_start" >> ${METALLIB_EMBED_ASM}
246 COMMAND echo "_ggml_metallib_start:" >> ${METALLIB_EMBED_ASM}
247 COMMAND echo ".incbin \\\"${METALLIB_SOURCE_EMBED}\\\"" >> ${METALLIB_EMBED_ASM}
248 COMMAND echo ".globl _ggml_metallib_end" >> ${METALLIB_EMBED_ASM}
249 COMMAND echo "_ggml_metallib_end:" >> ${METALLIB_EMBED_ASM}
250 DEPENDS ggml-metal.metal ggml-common.h
251 COMMENT "Generate assembly for embedded Metal library"
252 )
253
254 set(GGML_SOURCES_METAL ${GGML_SOURCES_METAL} ${METALLIB_EMBED_ASM})
255 else()
256 if (LLAMA_METAL_SHADER_DEBUG)
257 # custom command to do the following:
258 # xcrun -sdk macosx metal -fno-fast-math -c ggml-metal.metal -o ggml-metal.air
259 # xcrun -sdk macosx metallib ggml-metal.air -o default.metallib
260 #
261 # note: this is the only way I found to disable fast-math in Metal. it's ugly, but at least it works
262 # disabling fast math is needed in order to pass tests/test-backend-ops
263 # note: adding -fno-inline fixes the tests when using MTL_SHADER_VALIDATION=1
264 # note: unfortunately, we have to call it default.metallib instead of ggml.metallib
265 # ref: https://github.com/ggerganov/whisper.cpp/issues/1720
266 set(XC_FLAGS -fno-fast-math -fno-inline -g)
267 else()
268 set(XC_FLAGS -O3)
269 endif()
270
271 # Append macOS metal versioning flags
272 if (LLAMA_METAL_MACOSX_VERSION_MIN)
273 message(STATUS "Adding -mmacosx-version-min=${LLAMA_METAL_MACOSX_VERSION_MIN} flag to metal compilation")
274 list(APPEND XC_FLAGS -mmacosx-version-min=${LLAMA_METAL_MACOSX_VERSION_MIN})
275 endif()
276 if (LLAMA_METAL_STD)
277 message(STATUS "Adding -std=${LLAMA_METAL_STD} flag to metal compilation")
278 list(APPEND XC_FLAGS -std=${LLAMA_METAL_STD})
279 endif()
280
281 add_custom_command(
282 OUTPUT ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/default.metallib
283 COMMAND xcrun -sdk macosx metal ${XC_FLAGS} -c ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/ggml-metal.metal -o ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/ggml-metal.air
284 COMMAND xcrun -sdk macosx metallib ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/ggml-metal.air -o ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/default.metallib
285 COMMAND rm -f ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/ggml-metal.air
286 COMMAND rm -f ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/ggml-common.h
287 COMMAND rm -f ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/ggml-metal.metal
288 DEPENDS ggml-metal.metal ggml-common.h
289 COMMENT "Compiling Metal kernels"
290 )
291
292 add_custom_target(
293 ggml-metal ALL
294 DEPENDS ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/default.metallib
295 )
296 endif() # LLAMA_METAL_EMBED_LIBRARY
297
298 set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS}
299 ${FOUNDATION_LIBRARY}
300 ${METAL_FRAMEWORK}
301 ${METALKIT_FRAMEWORK}
302 )
303endif()
304
305if (LLAMA_OPENMP)
306 find_package(OpenMP)
307 if (OpenMP_FOUND)
308 message(STATUS "OpenMP found")
309 add_compile_definitions(GGML_USE_OPENMP)
310 set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} OpenMP::OpenMP_C OpenMP::OpenMP_CXX)
311 else()
312 message(WARNING "OpenMP not found")
313 endif()
314endif()
315
316if (LLAMA_BLAS)
317 if (LLAMA_STATIC)
318 set(BLA_STATIC ON)
319 endif()
320 #if (CMAKE_VERSION VERSION_GREATER_EQUAL 3.22)
321 # set(BLA_SIZEOF_INTEGER 8)
322 #endif()
323
324 set(BLA_VENDOR ${LLAMA_BLAS_VENDOR})
325 find_package(BLAS)
326
327 if (BLAS_FOUND)
328 message(STATUS "BLAS found, Libraries: ${BLAS_LIBRARIES}")
329
330 if (("${BLAS_INCLUDE_DIRS}" STREQUAL "") AND NOT (${LLAMA_BLAS_VENDOR} MATCHES "Apple"))
331 # BLAS_INCLUDE_DIRS is missing in FindBLAS.cmake.
332 # see https://gitlab.kitware.com/cmake/cmake/-/issues/20268
333 find_package(PkgConfig REQUIRED)
334 if (${LLAMA_BLAS_VENDOR} MATCHES "Generic")
335 pkg_check_modules(DepBLAS REQUIRED blas)
336 elseif (${LLAMA_BLAS_VENDOR} MATCHES "OpenBLAS")
337 # As of openblas v0.3.22, the 64-bit is named openblas64.pc
338 pkg_check_modules(DepBLAS openblas64)
339 if (NOT DepBLAS_FOUND)
340 pkg_check_modules(DepBLAS REQUIRED openblas)
341 endif()
342 elseif (${LLAMA_BLAS_VENDOR} MATCHES "FLAME")
343 pkg_check_modules(DepBLAS REQUIRED blis)
344 elseif (${LLAMA_BLAS_VENDOR} MATCHES "ATLAS")
345 pkg_check_modules(DepBLAS REQUIRED blas-atlas)
346 elseif (${LLAMA_BLAS_VENDOR} MATCHES "FlexiBLAS")
347 pkg_check_modules(DepBLAS REQUIRED flexiblas_api)
348 elseif (${LLAMA_BLAS_VENDOR} MATCHES "Intel")
349 # all Intel* libraries share the same include path
350 pkg_check_modules(DepBLAS REQUIRED mkl-sdl)
351 elseif (${LLAMA_BLAS_VENDOR} MATCHES "NVHPC")
352 # this doesn't provide pkg-config
353 # suggest to assign BLAS_INCLUDE_DIRS on your own
354 if ("${NVHPC_VERSION}" STREQUAL "")
355 message(WARNING "Better to set NVHPC_VERSION")
356 else()
357 set(DepBLAS_FOUND ON)
358 set(DepBLAS_INCLUDE_DIRS "/opt/nvidia/hpc_sdk/${CMAKE_SYSTEM_NAME}_${CMAKE_SYSTEM_PROCESSOR}/${NVHPC_VERSION}/math_libs/include")
359 endif()
360 endif()
361 if (DepBLAS_FOUND)
362 set(BLAS_INCLUDE_DIRS ${DepBLAS_INCLUDE_DIRS})
363 else()
364 message(WARNING "BLAS_INCLUDE_DIRS neither been provided nor been automatically"
365 " detected by pkgconfig, trying to find cblas.h from possible paths...")
366 find_path(BLAS_INCLUDE_DIRS
367 NAMES cblas.h
368 HINTS
369 /usr/include
370 /usr/local/include
371 /usr/include/openblas
372 /opt/homebrew/opt/openblas/include
373 /usr/local/opt/openblas/include
374 /usr/include/x86_64-linux-gnu/openblas/include
375 )
376 endif()
377 endif()
378
379 message(STATUS "BLAS found, Includes: ${BLAS_INCLUDE_DIRS}")
380
381 add_compile_options(${BLAS_LINKER_FLAGS})
382
383 add_compile_definitions(GGML_USE_BLAS)
384
385 if (${BLAS_INCLUDE_DIRS} MATCHES "mkl" AND (${LLAMA_BLAS_VENDOR} MATCHES "Generic" OR ${LLAMA_BLAS_VENDOR} MATCHES "Intel"))
386 add_compile_definitions(GGML_BLAS_USE_MKL)
387 endif()
388
389 set(GGML_HEADERS_BLAS ggml-blas.h)
390 set(GGML_SOURCES_BLAS ggml-blas.cpp)
391
392 set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} ${BLAS_LIBRARIES})
393 set(LLAMA_EXTRA_INCLUDES ${LLAMA_EXTRA_INCLUDES} ${BLAS_INCLUDE_DIRS})
394 else()
395 message(WARNING "BLAS not found, please refer to "
396 "https://cmake.org/cmake/help/latest/module/FindBLAS.html#blas-lapack-vendors"
397 " to set correct LLAMA_BLAS_VENDOR")
70# override ggml options
71set(GGML_CCACHE ${LLAMA_CCACHE})
72
set(GGML_BUILD_SHARED_LIBS ${LLAMA_BUILD_SHARED_LIBS})
slaren327 days ago (edited 327 days ago)👍 1

GGML_BUILD_SHARED_LIBS and LLAMA_BUILD_SHARED_LIBS do not exist, BUILD_SHARED_LIBS is used directly.

mofosyne
mofosyne327 days ago👍 2

if you are reorganizing source codes, I would like to suggest moving some compiled tools from the example folder into a dedicated folder. It's not quite a script, but in my opinion its not examples as well... considering it's more for maintainers internal usage

ggerganov ggerganov force pushed from 7c1c378c to 622fc9b5 326 days ago
ggerganov ggerganov force pushed from 622fc9b5 to c23c6983 325 days ago
ggerganov ggerganov force pushed from c23c6983 to e6faa61e 325 days ago
ggerganov ggerganov force pushed from 0a977ffc to 0938c55b 325 days ago
ggerganov ggerganov force pushed from d011b9ff to 5b218911 324 days ago
ggerganov ggerganov force pushed from 5b218911 to bddb4931 324 days ago
ggerganov ggerganov force pushed from bddb4931 to 5e376582 324 days ago
ggerganov ggerganov force pushed from c57cbf5a to 1f859b8c 324 days ago
ggerganov ggerganov force pushed from caf50ea9 to b192bed3 324 days ago
ggerganov ggerganov force pushed from b192bed3 to 417de6b9 324 days ago
github-actions github-actions added server
ggerganov ggerganov marked this pull request as ready for review 324 days ago
ggerganov
ggerganov324 days ago (edited 323 days ago)👍 1

The PRs are ready for review. The CI in llama.cpp is still failing for some of the workflows:

There are quite some things to improve upon this, but probably it is better to merge first and resolve from master.

ggerganov ggerganov requested a review from slaren slaren 324 days ago
slaren
slaren323 days ago

This may be related to this: https://cmake.org/cmake/help/latest/policy/CMP0165.html#policy:CMP0165

It may also affect CUDA. I get these warnings:

Make Warning (dev) in src/CMakeLists.txt:
  Policy CMP0104 is not set: CMAKE_CUDA_ARCHITECTURES now detected for NVCC,
  empty CUDA_ARCHITECTURES not allowed.  Run "cmake --help-policy CMP0104"
  for policy details.  Use the cmake_policy command to set the policy and
  suppress this warning.

  CUDA_ARCHITECTURES is empty for target "llama".
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) in examples/gguf/CMakeLists.txt:
  Policy CMP0104 is not set: CMAKE_CUDA_ARCHITECTURES now detected for NVCC,
  empty CUDA_ARCHITECTURES not allowed.  Run "cmake --help-policy CMP0104"
  for policy details.  Use the cmake_policy command to set the policy and
  suppress this warning.

  CUDA_ARCHITECTURES is empty for target "llama-gguf".
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) in examples/llava/CMakeLists.txt:
  Policy CMP0104 is not set: CMAKE_CUDA_ARCHITECTURES now detected for NVCC,
  empty CUDA_ARCHITECTURES not allowed.  Run "cmake --help-policy CMP0104"
  for policy details.  Use the cmake_policy command to set the policy and
  suppress this warning.

  CUDA_ARCHITECTURES is empty for target "llava_shared".
This warning is for project developers.  Use -Wno-dev to suppress it.

I am not sure what is supposed to be the solution in this case. I think that the source of the problem is using multiple CMakeLists.txt for the same target. Maybe this could be fixed by using include instead of add_subdirectory.

slaren
slaren323 days ago (edited 323 days ago)

Probably the simplest solution would be to remove the src/CMakeLists.txt entirely and merge them into the base CMakeList.txt. I think that the expectation is that add_subdirectory is only used when a project has multiple targets.

Each backend could be moved to a different target. This will need to be done when making the backends dynamically loadable in the future.

ggerganov
ggerganov323 days ago

Probably the simplest solution would be to remove the src/CMakeLists.txt entirely and merge them into the base CMakeList.txt.

We still need to add_subdirectory(ggml) in the root llama.cpp/CMakeLists.txt - wouldn't it hit the same issue?

I've resolved these warnings in whisper.cpp by enabling CUDA in whisper.cpp/src/CMakeLists.txt: https://github.com/ggerganov/whisper.cpp/blob/f3c609bd2e69b40f7500af8e13c1db270f997403/src/CMakeLists.txt#L80-L101

We can do the same in llama.cpp/src/CMakeLists.txt, but for HIP it seems more complicated because it checks for some compiler compatibility

slaren
slaren323 days ago

We still need to add_subdirectory(ggml) in the root llama.cpp/CMakeLists.txt - wouldn't it hit the same issue?

My expectation was that ggml would be built either as a static or a shared library, and then linked to llama.cpp. However this does not seem to be happening because the ggml target is created as an "object" library, which seems to mean that it does not behave as a library at all, and it is more like adding the source files to the parent project. Changing this fixes the CUDA warnings for me.

index 4f146026..f7ecdc2a 100644
--- a/ggml/src/CMakeLists.txt
+++ b/ggml/src/CMakeLists.txt
@@ -1129,7 +1129,7 @@ endif()

 # ggml

-add_library(ggml OBJECT
+add_library(ggml
             ../include/ggml.h
             ../include/ggml-alloc.h
             ../include/ggml-backend.h

Is there any reason to use an OBJECT library?

ggerganov
ggerganov323 days ago

Ah, it wasn't intended to be an object library - I just didn't notice. Nice catch

slaren
slaren commented on 2024-06-25
Conversation is marked as resolved
Show resolved
ggml/src/CMakeLists.txt
11691170
1170add_library(ggml_static STATIC $<TARGET_OBJECTS:ggml>)
1171
11721171if (BUILD_SHARED_LIBS)
11731172 set_target_properties(ggml PROPERTIES POSITION_INDEPENDENT_CODE ON)
1174 add_library(ggml_shared SHARED $<TARGET_OBJECTS:ggml>)
1175 target_link_libraries(ggml_shared PRIVATE Threads::Threads ${GGML_EXTRA_LIBS})
1173
target_link_libraries(ggml PRIVATE Threads::Threads ${GGML_EXTRA_LIBS})
slaren323 days ago

This is probably not necessary here, it is already done in line 1164.

ggerganov323 days ago

Force-pushed a fix and I also think the Kompute build should work now

ggerganov ggerganov force pushed from 31ad52be to 850f45ce 323 days ago
ggerganov ggerganov force pushed from 850f45ce to 85d6859f 323 days ago
slaren
slaren323 days ago👍 3🎉 1

This may also fix ggml being linked multiple times when using shared libraries, which should reduce the size of the pre-build binaries significantly. Currently, the windows CUDA binaries are almost 1 GB in total. Compare the file sizes of my build from this branch with the latest build from github:

image

slaren
slaren commented on 2024-06-25
Conversation is marked as resolved
Show resolved
Makefile
174238MK_CXXFLAGS = -std=c++11 -fPIC
175239MK_NVCCFLAGS = -std=c++11
176240
177
# -Ofast tends to produce faster code, but may not be available for some compilers.
178
ifdef LLAMA_FAST
179
MK_CFLAGS += -Ofast
180
HOST_CXXFLAGS += -Ofast
181
ifndef LLAMA_DEBUG
182
MK_NVCCFLAGS += -O3
183
endif # LLAMA_DEBUG
184
else
185
MK_CFLAGS += -O3
186
MK_CXXFLAGS += -O3
187
ifndef LLAMA_DEBUG
188
MK_NVCCFLAGS += -O3
189
endif # LLAMA_DEBUG
190
endif # LLAMA_FAST
191
slaren323 days ago👍 1

The removal of this is causing the make build to be built without optimizations.

slaren
slaren323 days ago👍 1

There are still some references to LLAMA_CUDA in the Makefile:

diff --git a/Makefile b/Makefile
index 88084cac..80e7678d 100644
--- a/Makefile
+++ b/Makefile
@@ -859,7 +859,7 @@ override NVCCFLAGS := $(MK_NVCCFLAGS) $(NVCCFLAGS)
 override LDFLAGS   := $(MK_LDFLAGS) $(LDFLAGS)

 # identify CUDA host compiler
-ifdef LLAMA_CUDA
+ifdef GGML_CUDA
 GF_CC := $(NVCC) $(NVCCFLAGS) 2>/dev/null .c -Xcompiler
 include scripts/get-flags.mk
 CUDA_CXXFLAGS := $(BASE_CXXFLAGS) $(GF_CXXFLAGS) -Wno-pedantic
@@ -884,7 +884,7 @@ $(info I NVCCFLAGS: $(NVCCFLAGS))
 $(info I LDFLAGS:   $(LDFLAGS))
 $(info I CC:        $(shell $(CC)   --version | head -n 1))
 $(info I CXX:       $(shell $(CXX)  --version | head -n 1))
-ifdef LLAMA_CUDA
+ifdef GGML_CUDA
 $(info I NVCC:      $(shell $(NVCC) --version | tail -n 1))
 CUDA_VERSION := $(shell $(NVCC) --version | grep -oP 'release (\K[0-9]+\.[0-9])')
 ifeq ($(shell awk -v "v=$(CUDA_VERSION)" 'BEGIN { print (v < 11.7) }'),1)
@@ -896,7 +896,7 @@ endif # CUDA_POWER_ARCH
 endif # CUDA_DOCKER_ARCH

 endif # eq ($(shell echo "$(CUDA_VERSION) < 11.7" | bc),1)
-endif # LLAMA_CUDA
+endif # GGML_CUDA
 $(info )

 ifdef DEPRECATE_WARNING
slaren
slaren commented on 2024-06-25
Conversation is marked as resolved
Show resolved
docs/BLIS.md
33make LLAMA_BLIS=1 -j
34# make LLAMA_BLIS=1 benchmark-matmult
33make GGML_BLIS=1 -j
34
# make GGML_BLIS=1 benchmark-matmult
slaren323 days ago👍 1

Not from this change, but this is wrong regardless.

Suggested change
# make GGML_BLIS=1 benchmark-matmult
# make GGML_BLIS=1 llama-benchmark-matmult
Conversation is marked as resolved
Show resolved
ggml/src/CMakeLists.txt
310 list(APPEND GGML_CDEF_PRIVATE GGML_CUDA_FORCE_MMQ)
311 endif()
312
313
if (GGML_CUDA_FORCE_CUBLAS)
slaren323 days ago

I am not sure if GGML_CDEF_PRIVATE is really necessary, or I don't understand its purpose. I think that using add_compile_definitions would achieve the same thing. From what I can tell, both target_compile_definitions(PRIVATE) and add_compile_definitions modify the same underlying property COMPILE_DEFINITIONS. For the public definitions it's different since INTERFACE_COMPILE_DEFINITIONS needs to be modified as well, so that in that case I think that GGML_CDEF_PUBLIC makes sense.

There are also a lot of usages of add_compile_options in this file, so if this is really necessary, shouldn't all of these be replaced as well?

Otherwise this should be updated to follow the same pattern:

Suggested change
add_compile_definitions(GGML_CUDA_FORCE_CUBLAS)
list(APPEND GGML_CDEF_PRIVATE GGML_CUDA_FORCE_CUBLAS)
ggerganov323 days ago

Restored add_compile_definitions for the private flags

ggerganov ggerganov force pushed from 5a1660ac to 3c1532e0 323 days ago
ggerganov scripts : update sync [no ci]
ef74eb13
ggerganov files : relocate [no ci]
98029722
ggerganov ci : disable kompute build [no ci]
62795ad0
ggerganov cmake : fixes [no ci]
888d790d
ggerganov server : fix mingw build
070031df
ggerganov cmake : minor [no ci]
39783930
ggerganov cmake : link math library [no ci]
c7c18f35
ggerganov cmake : build normal ggml library (not object library) [no ci]
723739f0
ggerganov cmake : fix kompute build
63b75fc4
ggerganov make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
0d93f027
slaren move public backend headers to the public include directory (#8122)
22495bfe
ggerganov ggerganov force pushed from 7d09bfff to 22495bfe 323 days ago
github-actions github-actions added testing
ggerganov scripts : fix sync paths [no ci]
fd1d48a2
ggerganov ggerganov force pushed from 9f4129a1 to fd1d48a2 323 days ago
ggerganov scripts : sync ggml-blas.h [no ci]
c11dc915
slaren
slaren approved these changes on 2024-06-26
ggerganov ggerganov merged f3f65429 into master 322 days ago
ggerganov ggerganov deleted the gg/reorganize-project branch 322 days ago
OuadiElfarouki
OuadiElfarouki322 days ago👀 2

@slaren The LLAMA_CUDA_FORCE_CUBLAS cmake option got mistakenly removed but is still used. I believe it's intended to be mutually exclusive with GGML_CUDA_FORCE_MMQ so some changes might be needed (cmake, mmq.cu, ggml-cuda.cu ..)

ggerganov
ggerganov322 days ago (edited 322 days ago)👍 2

There is now GGML_CUDA_FORCE_CUBLAS

Edit: nvm #8140

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone