pytorch
6c8faf46 - [executorch] Run llama in xplat (#118831)

Commit View On GitHub

Commit

234 days ago

[executorch] Run llama in xplat (#118831) Summary: Error running llama in xplat, where half type isnt part of c10_mobile targets. See: D53158320 This diff: - Creates a `torch_mobile_all_ops_et` target, which is the same as `torch_mobile_all_ops`, except with a preprocessor flag (C10_MOBILE_HALF) to support Half type - Check C10_MOBILE_HALF in LinearAlgebra.cpp and include it - Use `torch_mobile_all_ops_et` for executorch, instead of `torch_mobile_all_ops`. Considerations: - Using `torch_mobile_all_ops_et` across executorch means that our runtime binary size for xplat aten increases (see test plan for increase amount, thanks tarun292 for the pointer). This may be okay, as aten mode isn't used in production. Test Plan: Run language llama in xplat: ``` buck2 run xplat/executorch/examples/models/llama2:main_aten -- --model_path llama-models/very_new_checkpoint_h.pte --tokenizer_path llama-models/flores200sacrebleuspm.bin --prompt 'fr Hello' --eos ``` And in fbcode: ``` buck2 run fbcode//executorch/examples/models/llama2:main_aten -- --model_path llama-models/very_new_checkpoint_h.pte --tokenizer_path llama-models/flores200sacrebleuspm.bin --prompt 'fr Hello' --eos ``` Test executor_runner size increase with: ``` buck2 build fbcode//executorch/sdk/fb/runners:executor_runner_aten ``` ||original|this diff (+half dtype)|diff| |unstripped|214975784|214976472|+688| |stripped|71373488|71373808|+320| Differential Revision: D53292674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118831 Approved by: https://github.com/larryliu0820

Author

lucylq

Committer

pytorchmergebot

Parents

a64b03a5

pytorch 6c8faf46 - [executorch] Run llama in xplat (#118831)

Commit

pytorch
6c8faf46 - [executorch] Run llama in xplat (#118831)