[BF16] Add a missing thread local specifier to autocast_gpu_dtype (#63416)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63416
Fix a missing thread local specifier introduced by recent PR
https://github.com/pytorch/pytorch/pull/61002
Test Plan: Unit Tests
Reviewed By: ngimel
Differential Revision: D30376154
fbshipit-source-id: c70d37ec85c3eba88eb87f766f1c4e7aeff8eaf9