Skip the source info in the error report if the source code is too large (#105608)
Summary:
A small model (<100MB) took about 20mins to load, and consume 16GB memory.
Strobelight profiling: https://fburl.com/strobelight/abwtz0ry
We realized that calc_line_start_offsets is culprit, and the line_starting_offsets_ is a vector of line numbers.
There are >20000 places we generate such ErrorReport, and the line number is ~100000.
So total memory cost is about 100000 x 20000 x 8 = ~16GB.
We propose to skip the error info for extreme large source file (>1MB). And keep an environment variable to keep the ability to print the source code info for large source file.
Test Plan:
buck run mode/opt-split-dwarf scripts/lufang:load_pt_model -- --model_file_path=/data/local/models/961746678/2/961746678_2.predictor.disagg.gpu.local
before the change, it takes 20mins to load, and the model costs 16GB memory (the model itself is only <100MB)
after the change, it takes 15s to load.
The most of the time / space is spent on calc_line_start_offsets, https://fburl.com/code/2to60zqu
Differential Revision: D47610805
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105608
Approved by: https://github.com/hl475