[profiler][small] Speed up postprocessing (#58021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58021
Improve complexity of _remove_dup_nodes function
Test Plan:
using trivial microbenchmark:
```
import torch
from torch.autograd.profiler import *
import time
evts = EventList()
id_cnt = 0
for r in range(10*1000):
st = r * 1000
evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100))
evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99))
evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90))
id_cnt+=3
st = time.time()
evts._build_tree()
print("Elapsed: {:.3f}s".format(time.time() - st))
```
```
After:
python test_prof.py
Elapsed: 0.203s
Before:
python test_prof.py
Elapsed: 3.653s
```
Reviewed By: gdankel
Differential Revision: D28347217
Pulled By: ilia-cher
fbshipit-source-id: d62da3400009f1fa8cb41a11a828aa8307f190bf