supporting different hidden dimensions (#559)
* supporting different hidden dimensions
* add support for larger hidden dimensions (greater than 8K)
* remove empty line
* add loop unrolling factor for dropout kernels
* update different kernels based on the reviews
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>