Avoid using fx.Interpreter in nvfuser executor function (#83607)
Using fx.Interpreter is a nice way of modifying the calls inside of FX graphs, but it introduces unnecessary overhead in this case.
Example:
```py
import torch
from torch.fx.experimental.proxy_tensor import make_fx
from torch._prims.context import TorchRefsNvfuserCapabilityMode
from torch._prims.executor import execute
a = torch.randn(3, 2, dtype=torch.float16, device="cuda")
s = torch.sigmoid
d = torch.digamma # digamma is not supported in nvfuser and aten eager execution is used
def func(a):
return s(d(s(d(s(d(s(a)))))))
with TorchRefsNvfuserCapabilityMode():
gm = make_fx(func)(a)
%%timeit
execute(gm, a, executor="nvfuser"); torch.cuda.synchronize();
# On master: 350 µs
# With this PR: 130 µs
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83607
Approved by: https://github.com/ezyang