Allow for device-generating APIs to return a device optionally, so that the XLA tensor creationg API in the bridge can pass the input tensors straight through.
Fix the DEBUG=1 link error for missing c10::Half::from_bits symbol, which is indeed defined in libc10.so (and confirmed by the fact that at runtime everything is OK).