47% FastGen speedup for low workload - refactor allocator (#5090)
This PR refactor FastGen allocator and add caching for empty_from method

DS Master: Deployment: Mixtral-8x7B-v0.1-tp4-b768 Clients: 1, Prompt
(mean): 500 tokens, Generation (mean): 1024 tokens, Query throughput:
0.075 queries/s, Token throughput (total): **163.130 tokens/s**, Query
latency: 13.310 s, Token generation latency: 0.020 s/token, First token
received: 0.055 s
This PR: Deployment: Mixtral-8x7B-v0.1-tp4-b768-allocator-rework
Clients: 1, Prompt (mean): 500 tokens, Generation (mean): 1024 tokens,
Query throughput: 0.095 queries/s, Token throughput (total): **240.386
tokens/s**, Query latency: 10.472 s, Token generation latency: 0.016
s/token, First token received: 0.056 s
---------
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>