pytorch
bee7e781 - [PT2 Inference] Prototype of Inference Runtime (#108482)

Commit View On GitHub

Commit

1 year ago

[PT2 Inference] Prototype of Inference Runtime (#108482) Summary: This diff demonstrates a simplified E2E workflow for PT2 Inference stack: 1. Model author with `torch.export()` 2. Model processing with `aot_inductor.compile()` 3. Model served with a new Inference Runtime API, named `ModelRunner` `torch.export()` and `aot_inductor.compile()` produces a zip file using `PyTorchStreamWriter`. Runtime reads the zip file with `PyTorchStreamReader`. The zip file contains {F1080328179} More discussion on packaging can be found in https://docs.google.com/document/d/1C-4DP5yu7ZhX1aB1p9JcVZ5TultDKObM10AqEtmZ-nU/edit?usp=sharing Runtime can now switch between two Execution modes: 1. Graph Interpreter mode, implemented based on Sigmoid's Executor 2. AOTInductor mode, implemented based on FBAOTInductorModel Test Plan: buck2 run mode/dev-nosan mode/inplace -c fbcode.enable_gpu_sections=True //sigmoid/inference/test:e2e_test Export and Lower with AOTInductor buck2 run mode/dev-sand mode/inplace -c fbcode.enable_gpu_sections=True sigmoid/inference:export_package Run with GraphInterpreter and AOTInducotr buck2 run mode/dev-nosan //sigmoid/inference:main Reviewed By: suo Differential Revision: D47781098 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108482 Approved by: https://github.com/zhxchen17

Author

SherlockNoMad

Committer

pytorchmergebot

Parents

5a4fe05a

pytorch bee7e781 - [PT2 Inference] Prototype of Inference Runtime (#108482)

Commit

pytorch
bee7e781 - [PT2 Inference] Prototype of Inference Runtime (#108482)