model_dump: Add a section that summarizes tensor memory usage (#57658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57658
Since there is no Python change here and we only do the analysis when
rendering the open section, this should have no impact on page size or
load time! (Well, a constant impact on page size due to the added
code.) Before I made it lazy, I observed that it increased load time by
over 100ms for a large model.
Test Plan: Dumped a CUDA model and saw the size summary.
Reviewed By: malfet
Differential Revision: D28531394
Pulled By: dreiss
fbshipit-source-id: f77012b7bab069de861a4ba23486c665e1306aa0