[Build] Adjust nvcc_threads for CI (#27296)
We recently updated flash attention and have more cu files. Each cu file
need a lot of CPU memory to compile.
Previously, we did not set nvcc_threads, and number_of_nvcc_threads()
returns 3 for A10 build machine. That number is too large since the
memory is limited (55 GB), while number of parallel is large. For
example, if the machine has 8 cpu cores, 8 * 3 means there are total 24
nvcc threads. It might encounter out of memory.
Here we update the number_of_nvcc_threads() to use updated number of
flash attention cu files, and explicitly set nvcc_threads in CI build to
avoid out-of-memory in build.