[nnc] Add call_with_numel interface for fast CUDA calls (#65213)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65213
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D31319012
Pulled By: bertmaher
fbshipit-source-id: 93fee80f956795470f5a2ce3b33c2ea2f132036f