[GPU host alloc] Fast path for size 0 malloc (#68532)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68532
Diff to better handle size 0 pinned memory allocation requests.
----
### Behavior before fix
The very first size 0 malloc comes in. It will create a block with `{key: 0, value: Block(0, 0, true)}`.
Another size 0 malloc comes in.
It will either 1) get a block with size > 0 (which is a waste of pinned memory) or 2) call `cudaHostAlloc()` with size 0 to eventually get *ptr=0.
Note that this block is *not registered* to the block pool because we have a duplicate entry (and that's why we will keep wasting size > 0 pinned memory block, if `available.empty() == false`).
----
### Behavior after fix
Let `malloc()` simply return a nullptr (0).
This avoids wasting valid size > 0 blocks as well as save the calls to `cudaHostAlloc()` which is expensive.
This is also safe since `free()` simply returns success for nullptrs.
-----
Test Plan: Unit tests.
Reviewed By: yinghai
Differential Revision: D32487522
fbshipit-source-id: 6140cab54ff5a34ace7d046f218fb32805c692c0