[kineto] global callback support in ProfilerKineto (#76078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76078
templatize `pushProfilingCallbacks` to support `RecordFunction` global callback support. The reason for templatizing is to
1. squeeze out performance on hot path
2. work around the capture-less lambdas
Test Plan:
## Global Callback
These were tested in conjunction with e2e subsequent diffs in both `trace_tester` and `sigrid`
sample trace: https://fburl.com/perfdoctor/tzgtw2ln
## Local Callback
https://fburl.com/perfdoctor/l58nfiyp
Reviewed By: robieta
Differential Revision: D35457300
fbshipit-source-id: 9d587ec68bfd405e565cc8956b0afa2cdaf95b94
(cherry picked from commit 9d8a9063d7525972d5364307c95ed50f6bafe3ec)