[caffe2] fix invalid % escape in inline assembly strings (#33554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33554
NVCC/GCC accepts the existing syntax, but not Clang which requires a proper escape. Here `%laneid` is one of the many registers that CUDA's pseudo-asm provides [1]. And using the extra `%` doesn't change the semantics, as PTX expects `%laneid` value after it's processed by the asm tool.
1. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html
Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
Reviewed By: bddppq
Differential Revision: D20003621
fbshipit-source-id: 8e550e55a3455925e7bd92c6df3e504b5d38c2dc