onnxruntime
87b14ac7 - Release backward inputs per static graph ref count (#20804)

Commit

1 year ago

Release backward inputs per static graph ref count (#20804) ### Release backward inputs per static graph ref count For the output buffer marked as external output: 1. Remove the additional ref count we used for avoiding reusing buffer. Instead, when we find reuse input/output buffer, we will make sure the reused buffer not not generated by nodes that has external outputs. 2. Remove the ref count of pybind feed inputs, which exists all the time until the run_backward completed. Instead, passing a mutuble feeds, and we clean the feeds vector once that is copied into session states and not needed any more before run the graph sequencentially. #### Before the change: One of the backward inputs is 3.9GB, it lives until the backward ends. ![image](https://github.com/microsoft/onnxruntime/assets/10530022/e71e2072-eaaa-4be3-a39f-0ca74b507265) #### With the change: The 3.9GB is released when the last node depending on that tensor completed. ![image](https://github.com/microsoft/onnxruntime/assets/10530022/7b27d01f-c675-4faf-9a3e-f886b31b2afe) Be noted: the peak did not change though, we have more work to do to reduce on the peak. #### Others It is found there are few tests that were updated to use incorrect expected values in previous code refactoring https://github.com/microsoft/onnxruntime/commit/a81faee41ef2344de448caecb0f42a34fdc9ead7#diff-9e8fbae7d3dff24106cd17564949f320e943cb3048eae07813c7de144f140419L382. This PR tries to fix them back, and I think now all test cases are back to normal. ### Motivation and Context

References

#20804 - Release backward inputs per static graph ref count

Author

pengwa

Parents

fff68c31

onnxruntime 87b14ac7 - Release backward inputs per static graph ref count (#20804)

onnxruntime
87b14ac7 - Release backward inputs per static graph ref count (#20804)