Make torch::deploy work with or without cuda (#58493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58493
In fbcode, we want torch::deploy to be a target that works with or without cuda, depending only on whether cuda is linked in the final binary. To enable this, we build both flavors of libinterpreter, and choose which to load at runtime depending on whether cuda is available in the application. This comes at a cost to binary size, as it includes two copies of libinterpreter instead of one. However, it does not require _loading_ two copies of libinterpreter into memory at runtime, so the memory footprint of the interpreter (which we make N copies of) is not impacted.
In oss/cmake, this change is a no-op. cuda is already handled there by building just one libinterpreter, but building cuda or not for the whole pytorch build based on a global cmake flag.
Test Plan: test in fbcode with new gpu mode unit tests, verify existing oss CI passes
Reviewed By: suo
Differential Revision: D28512178
fbshipit-source-id: 61354bf78b1932605a841388fcbc4bafc0c4bbb4