[executorch] Run llama in xplat (#118831)
Summary:
Error running llama in xplat, where half type isnt part of c10_mobile targets. See: D53158320
This diff:
- Creates a `torch_mobile_all_ops_et` target, which is the same as `torch_mobile_all_ops`, except with a preprocessor flag (C10_MOBILE_HALF) to support Half type
- Check C10_MOBILE_HALF in LinearAlgebra.cpp and include it
- Use `torch_mobile_all_ops_et` for executorch, instead of `torch_mobile_all_ops`.
Considerations:
- Using `torch_mobile_all_ops_et` across executorch means that our runtime binary size for xplat aten increases (see test plan for increase amount, thanks tarun292 for the pointer). This may be okay, as aten mode isn't used in production.
Test Plan:
Run language llama in xplat:
```
buck2 run xplat/executorch/examples/models/llama2:main_aten -- --model_path llama-models/very_new_checkpoint_h.pte --tokenizer_path llama-models/flores200sacrebleuspm.bin --prompt 'fr Hello' --eos
```
And in fbcode:
```
buck2 run fbcode//executorch/examples/models/llama2:main_aten -- --model_path llama-models/very_new_checkpoint_h.pte --tokenizer_path llama-models/flores200sacrebleuspm.bin --prompt 'fr Hello' --eos
```
Test executor_runner size increase with:
```
buck2 build fbcode//executorch/sdk/fb/runners:executor_runner_aten
```
||original|this diff (+half dtype)|diff|
|unstripped|214975784|214976472|+688|
|stripped|71373488|71373808|+320|
Differential Revision: D53292674
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118831
Approved by: https://github.com/larryliu0820