[aarch64][caffe2/torch/csrc/profiler] Support aarch64 in inline assembly (#104707)
Summary:
Port x86 inline assembly to aarch64:
- Use `sp` instead of `%rsp` for stack pointer; move to second caller-
saved register `x1` instead of `%rsi`
- Use `x29` instead of `%rbp` for base pointer; move to third caller-
saved register `x2` instead of `%rdx`
Test Plan:
```
$ buck2 build fbcode//mode/opt fbcode//caffe2/torch/fb/model_transform/fx2trt/packaging:generate_merge_net_file
```
Reviewed By: jasonjk-park
Differential Revision: D47242468
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104707
Approved by: https://github.com/aaronenyeshi