[Inductor][MacOS] resolve macos openmp problem and provide a holistic instruction (#107111)
There has been several reports of difficulty in using OpenMP in MacOS, e.g.: https://github.com/pytorch/pytorch/issues/95708 . And there are several PRs to fix it, e.g.: https://github.com/pytorch/pytorch/pull/93895 and https://github.com/pytorch/pytorch/pull/105136 .
This PR tries to explain the root cause, and provide a holistic and systematic way to fix the problem.
For the OpenMP program below to run, the compiler must:
- Be able to process macros like `#pragma omp parallel`
- Be able to find header files like `<omp.h>`
- Be able to link to a library file like `libomp`
```C++
#include <omp.h>
int main()
{
omp_set_num_threads(4);
#pragma omp parallel
{
int id = omp_get_thread_num();
int nthrds = omp_get_num_threads();
int y = id * nthrds;
}
}
```
In MacOS, there might be different compiler tools:
- Apple builtin `clang++`, installed with `xcode commandline tools`. The default `g++` and `clang++` commands both point to the Apple version, as can be confirmed by `g++ --version`
- Public `clang++`, can be installed via `brew install llvm`.
- Public GNU compiler `g++`, can be installed via `brew install gcc`.
Among these compilers, public `clang++` from LLVM and `g++` from GNU both support OpenMP with the flag `-fopenmp`. They have shipped with `<omp.h>` and `libomp` support. The only problem is that Apple builtin `clang++` does not contain `<omp.h>` or `libomp`. Therefore, users can follow the steps to enable OpenMP support:
- Use a compiler other than Apple builtin clang++ by specifying the `CXX` environment variable
- Use `conda install llvm-openmp` to place the header files and lib files inside conda environments (and can be discovered by `CONDA_PREFIX`)
- Use `brew install libomp` to place the header files and lib files inside brew control (and can be discovered by `brew --prefix libomp`)
- Use a custom install of OpenMP by specifying an `OMP_PREFIX` where header files and lib files can be found.
This PR reflects the above logic, and might serve as a final solution for resolving OpenMP issues in MacOS.
This PR also resolves the discussion raised in https://dev-discuss.pytorch.org/t/can-we-add-a-default-backend-when-openmp-is-not-available/1382/5 with @jansel , and provide a way for brew users to automatically find the installation via `brew --prefix libomp`, and provide instructions to switch to another compiler by `CXX` environment variable.
I have tested the following code in different conditions:
- Use `CXX` to point to an LLVM-clang++, works fine.
- Use `CXX` to point to a GNU g++, not working because the compiler flag `-Xclang`. Manually removing the code `base_flags += " -Xclang"` works.
- Use default compiler and `conda install llvm-openmp`, works fine
- Use default compiler and `brew install libomp`, works fine
- Do nothing, compiler complains `omp.h` not found.
```python
import torch
@torch.compile
def f(x):
return x + 1
f(torch.randn(5, 5))
```
If we want the code to be more portable, we can also deal with the `-Xclang` issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107111
Approved by: https://github.com/jgong5, https://github.com/jansel