Coalesce TLS accesses in RecordFunction constructor (#44970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44970
Right now, when RecordFunction is not active (usual case),
we do two TLS accesses (check for thread local callbacks, and check for
thread local boolean).
Experimenting with reducing number of TLS accesses in RecordFunction
constructor.
Test Plan: record_function_benchmark
Reviewed By: dzhulgakov
Differential Revision: D23791165
Pulled By: ilia-cher
fbshipit-source-id: 6137ce4bface46f540ece325df9864fdde50e0a4