PR #7039 Merge Dry Run - SemanticDiff

Forward pass using InferenceSession on exported ONNX

11b69f14

Perform forward pass using training graph with intermediate outputs

77cefcd6

Basic plumbing for backward pass. Not fully working

e71e0885

Add script to run Flexible API MVP PoC

d4449d86

Add flag to allow pytorch-only or ORT flexible api runs

56ca4ab0

Fix path on test script

f06cafde

Improve example to display grads before and after optim step

f1b5c25b

Add working example for MNIST (MVP)

3524fb04

module transformer

26e6d6d0

refactor

c36c8e14

Integrate automatic graph split into ORTModule

8b0ade0e

Update InferenceSession usage to match master

30042b6e

Add BERT classifier example

3b267d1d

Hard-code input types for DropoutGrad on BERT

d4917f2d

Refactor BERT classifier fine tune for better debugging

f1dc6e40

Change DropouGrad.input[1].input_type and del logits_grad from backwa…

ea5871ac

gradient graph split in backend.

934feb0c

sample code change.

6d8fde83

Refactor after Vincent work on splitting on backend

b7564d07

bugfix for graph inputs and outputs.

e759da17

fix input order, and input grad.

cfd57c01

split graphs info

f6a8d2aa

bugfix

60b6e268

Add list of initializer gradients to the backend training graph spliter

78831d00

Add support to BERT fine tuning (MVP 3)

ff79e874

add io binding

39ac95b2

Refactor IObinding

f13c2a61

Add IO binding support, which allows CUDA training

4d9267e1

Remove (unnecessary) gradient graph from frontend

395e082b

Refactor MNIST and BERT classifier to add time measures

07f5ae95

Remove initializers from forward ONNX graph

41b88ce9

Remove dead code

e986ae5f

Improve performance by running ApplyTransformers on gradient graph

e5fdb455

TEMP: Add support to measure method execution time for perf improvement

004632ff

Add initial dynamic axes support

7729bb3c

Improve dynamic axes to work without data descriptors

f7f435fc

remove initializers from original graph

c4f827be

ort's to_dlpack.

b8c8fe91

ortmodule ci pipeline setup (#6251)

e0f2a12c

Device handling fixes in ORTModule (#6187)

127afe3b

ci pipeline tests for ortmodule (#6268)

a92e762f

add poc test for ortmodule (using MNIST dataset) to the ci pipeline (…

f3a47990

Add ORTModule distributed CI pipeline (#6278)

9b7510d8

Add ORTModule BERT classifier to CI the pipeline (#6330)

0586c610

Add ORTModule deepspeed zero stage 1 test to the distributed CI pipel…

910c5ab6

Enable device change during training + minor forward() refactoring (#…

237b275b

Support inputs to ORTModule forward method that require gradient (#6420)

93aa72e4

Export the model with torch.no_grad() context (#6472)

785e51d2

Sync ORTModule branch with master and fix tests (#6526)

8a890ddf

Cache datasets on CI machines (#6525)

62ac1642

Add support for dynamic axes for outputs + check model output type be…

c983b843

Revert "Add support for dynamic axes for outputs + check model output…

bc0d04bf

OrtModule v0.21 (#6395)

eec602e4

Rename ONNX graphs variables in ORTModule (#6645)

9294dde1

Add support for dynamic axes for outputs + check model output type be…

0732d727

Add TNLRv3 fp16 pattern to Layer Norm fusion (#6661)

ff465483

Remove monkey patch for PyTorch Nightly + ORTTrainer (#6659)

7ee5baa6

Handle multiple devices scenarios (#6672)

7f33671a

Support non tuple return values from torch.nn.module (#6660)

01dfa8e1

Merge branch 'master' into thiagofc/merge-from-master

3184c47a

Fix build, cleanup.

eecce31a

Move event_pool and message_queue to core.

5b7e7aaa

Reduce binary size, limit asynchronous/backgroud thread stuff to tra…

9853ef84

Fix merge leftover

9d4b730e

Merge pull request #6714 from microsoft/thiagofc/merge-from-master

21f9e32c

Merge branch 'master' of https://github.com/microsoft/onnxruntime int…

40dda452

Merge pull request #6742 from microsoft/mzs/sync-from-master

b7b56121

Enable custom ops on ORTModule (#6740)

fb3f1f5c

ORTModule - FastGeluFusion/fp16 fix and minor LayerNormFusion cleanup…

39d182f7

Enable external CUDA allocator in ORTModule. (#6745)

1a2f1bd2

Support keyword arguments for ORTModule (#6539)

58f3aca9

Update torchtext usage for pytorch transformer sample (#6767)

563218dc

Re-enable test and increase timeout (#6785)

65ba51d9

Refactor device handling and basic support for PyTorch Lightning (#6758)

aa5cd37a

Support nested sequence and mapping types in ORTModule (#6791)

7ce4075b

Rewrite ORTModule background task coordination (#6700)

8e200e13

Clear iobinding outputs (#6774)

b05403d8

Remove backward workaround from test. (#6811)

99ffffbe

Mount hf model cache and use cache for loading hf models (#6810)

fa8a9015

Check gradient correctness in the UTs (#6803)

8a450d52

Added RequiredGrad attribute to YieldOp (#6657)

355057cf

Add pipeline to clear the cache for huggingface transormers models (#…

c1b0cf6d

Copy forward signature from PyTorch model. (#6777)

059ed1c2

Merge branch 'master' of https://github.com/microsoft/onnxruntime int…

ca48310d

Enable PyTorch Lightning basic test on CI (#6809)

f71d93ea

Merge pull request #6838 from microsoft/mzs/ortmodule-api-sync-from-m…

12edf22f

Add External Outputs Flag for YieldOp (#6789)

4238ce34

Add more asserts for ORTModule forward's correctness (#6887)

749e6a08

Merge branch 'master' of github.com:microsoft/onnxruntime into bmeswa…

d5667554

move SetOutputMLValue from op_kernel.h to op_kernel_context.h

aa93f2e2

Merge pull request #6890 from microsoft/bmeswani/merge_master_onto_or…

b429edcd

Enable priority-based execution order as default to support inputs wi…

ac4d6155

Separate requirements.txt file for ORTModule pipelines (#6879)

79f832c6

Introducing TrainingAgent interface to performance training using Yie…

dfc7c18e

Disable Materializing Grads (#6822)

56c5620f

Assert that the data is on the same device as ORTModule (#6942)

f1ade14e

Add UseCount for External Outputs (#6894)

91c6a330

Interchange Cast and Transpose operations to facilitate Transpose-Mat…

48eebed8

Clean ORTModule dev branch (#6944)

5303b33f

Add UT correctness and address comments for previous symbolic shape P…

3b2847b2

Support ORTModule on ROCm EP (#6945)

534adbb0

Relax atol for some ORTModule UTs (#6969)

3f579fac

Use DLPack for Graph Inputs and External Outputs of YieldOp (#6968)

8468099f

Support ROCM EP for ORTModule (#6967)

1e13e266

Add *args support for ORTModule inputs (#6883)

ce403eea

Introduce ORTModule training API to ONNX Runtime

89d45069

Post merge update for ORTModule

3348b848