Fixes error when too many parameters are passed to fused cuda kernel (#18063)
Summary:
Bug fix for https://github.com/pytorch/pytorch/issues/15043, where a large fusion in JIT with a large number of kernel arguments, which exceeds the limit allowed by nvrtc on a cuda device.
The fix is to check the number of arguments before a cuda kernel is generated. If the number exceeds the limit, take the runFallBack() path.
Add a reduced test from the original issue to keep the test time low. The test would fail without this fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18063
Differential Revision: D14691401
Pulled By: soumith
fbshipit-source-id: b98829bc89ed7724e91eda82ae3a5a1151af721a