Inductor: fall back bernoulli on cpu (#97002)
data type: float32
Input size: torch.Size([64, 4, 128, 128])
single socket (32cores):
```
Before: bernoulli 0.001327775239944458 s dropout 0.0014216173489888509 s
After: bernoulli 0.0002424612840016683 s dropout 0.00039757410685221353 s
```
single core:
```
Before: bernoulli 0.04154032731056213 s dropout 0.04382548745473226 s
After: bernoulli 0.006143261671066284 s dropout 0.0065830423831939695 s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97002
Approved by: https://github.com/jgong5, https://github.com/jansel