Enable thp(transparent huge pages) for buffer sizes >=2MB (#93888)
The 2MB thp pages provide better allocation latencies compared to the standard 4KB pages. This change has shown significant improvement for batch mode usecases where the tensor sizes are larger than 100MB.
Only enabled if `THP_MEM_ALLOC_ENABLE` environment variable is set.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93888
Approved by: https://github.com/jgong5, https://github.com/malfet