inductor: fix cpp wrapper ExternKernel check (#96799)
Fix cpp_wrapper functionality for ExternKernel. Changes in https://github.com/pytorch/pytorch/pull/91575 has disabled the cpp_wrapper for ExternKernel cases.
1. Need to set the `cpp_wrapper` attr before `V.graph.register_buffer(self)`.
`register_buffer` will invoke the below check:
https://github.com/pytorch/pytorch/blob/c6a82e433924b4d36fd571d36ce363cb1c622c76/torch/_inductor/graph.py#L220-L223
The current code which sets the `cpp_wrapper` after the `V.graph.register_buffer(self)` will always disable the cpp wrapper.
2. Fix the missing `ordered_kwargs_for_cpp_kernel` attr for `at::addmm_out`
3. Enhance the UT to check that cpp_wrapper has been turned on for the supported cases to prevent being unintentionally disabled by future changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96799
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel