Always convert truthy booleans to 1
Ref #54789
A `bool` has only two valid values, 1 or 0. Any in-memory value
outside of those leads to undefined behavior. So, instead of
`reinterpret_cast`-ing to `bool*` I introduce `c10::load<scalar_t>`
which will read as `unsigned char` and convert to a valid `bool`.
This gets >90% of operators working, but the remaining operators where
skips and xfails have been added will require individual attention.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77122
Approved by: https://github.com/mruberry