Fix compile error for nv_bloat162 (#7248)
some systems seem not to have the __nv_bfloat162 definition so a
placeholder was introduced. newer CUDA libs have that definition, which
breaks the compile process. this patch adds the official cuda_bf16.h
guard while keeping the old code and a safety assert in case the
definition should change in the future. see #7190 for reference
---------
Signed-off-by: LosCrossos <165311345+loscrossos@users.noreply.github.com>
Signed-off-by: LosCrossos <165311345+mytait@users.noreply.github.com>
Co-authored-by: LosCrossos <165311345+mytait@users.noreply.github.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>