Enable thp(transparent huge pages) for buffer sizes >=2MB (#95963)
The 2MB thp pages provide better allocation latencies compared to the standard 4KB pages. This change has shown substantial improvement for batch mode usecases where the tensor sizes are larger than 100MB.
Only enabled if THP_MEM_ALLOC_ENABLE environment variable is set.
re-landing https://github.com/pytorch/pytorch/pull/93888
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95963
Approved by: https://github.com/malfet