[PyTorch Edge] Reuse constant table from ts in bytecode (#56002)
Summary:
## Note:
**This change will include the feature, but the feature is not on. It will be enabled and bytecode version will be bumped in D27844651 (https://github.com/pytorch/pytorch/commit/8c04593c0a486bea7e2cbec348298d348742e096).**
Jit will generate constant tensor, and it locates in the constant folder (can find them after unzip model.ptl). Bytecode generated by lite interpreter also includes constant tensor, which are almost the same with the constant tensor value from jit. This pr will let lite interpreter reuses the constant tensor from jit, instead of reproducing the similar tensor values. The reading and writing session will be as following.
More details and background can found in [Lite Interpreter Model Size Issue](https://fb.quip.com/OSidAcjhL9LS).
Data size comparison can be found in [Model size analysis](https://fb.quip.com/oEm6A4bhbo06)
### Write
1. In `export_module.cpp`, store all constant tensor value from jit in an `unordered_map constants_from_jit`, where the tensor value use tensor string as a hash. constants_from_jit is a map: (tensor) => (archive_name, index). When writing bytecode archive `writeByteCode()`, the map `constants_from_jit` will also be passed all the way to it's pickler.
2. In `pickler.cpp`, a new map tensors_archive_table_ is added. It is also a map: (tensor) => (archive_name, index). The corresponding function to update the map is `updateTensorsArchiveTable`. When pushing the storage of a tensor, if the tensor exists in `tensors_archive_table_`, the root key will be `{archive_name}/{index}`, instead of `{index}`. For example, the tensor
```
torch._utils._rebuild_tensor_v2(pers.obj(('storage', torch.FloatStorage, '0', 'cpu', 90944),),
0,
(1, 116, 28, 28),
(90944, 784, 28, 1),
False,
collections.OrderedDict()),
```
will be like following instead
```
torch._utils._rebuild_tensor_v2(pers.obj(('storage', torch.FloatStorage, 'constants/0', 'cpu', 90944),),
0,
(1, 116, 28, 28),
(90944, 784, 28, 1),
False,
collections.OrderedDict()),
```
**Note**: Only tensors in bytecode archive will be different. The tensors in other archive remains the same, because `updateTensorsArchiveTable()` is only called when `use_tensors_archive_table_` is `true`, and `tensors_archive_table_` is only set as `true` when `bytecode_version` is a valid number.
### Read
1. In `import.cpp`, the function `read_record` passed to Unpickler is updated. The argument of `read_record` is the root key. In version 4, the root key will just be index, and `archive_name_plus_slash` + `name` will be used to get the tensor. With this change (version 5+), `read_record` will check if slash exists in the argument `name`. If it does, it means the argument is `archive_name/index`, and it can be used to get tensor directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56002
ghstack-source-id: 128498244
Test Plan:
### Verify the new model generated from this pr can reuse constant table and the numerical result is the same.
1. Build pytorch locally. `MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ USE_CUDA=0 DEBUG=1 MAX_JOBS=16 python setup.py develop`
2. Run `python save_lite.py`
```
import torch
# ~/Documents/pytorch/data/dog.jpg
model = torch.hub.load('pytorch/vision:v0.6.0', 'shufflenet_v2_x1_0', pretrained=True)
model.eval()
# sample execution (requires torchvision)
from PIL import Image
from torchvision import transforms
import pathlib
import tempfile
import torch.utils.mobile_optimizer
input_image = Image.open('~/Documents/pytorch/data/dog.jpg')
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model
# move the input and model to GPU for speed if available
if torch.cuda.is_available():
input_batch = input_batch.to('cuda')
model.to('cuda')
with torch.no_grad():
output = model(input_batch)
# Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes
print(output[0])
# The output has unnormalized scores. To get probabilities, you can run a softmax on it.
print(torch.nn.functional.softmax(output[0], dim=0))
traced = torch.jit.trace(model, input_batch)
sum(p.numel() * p.element_size() for p in traced.parameters())
tf = pathlib.Path('~/Documents/pytorch/data/data/example_debug_map_with_tensorkey.ptl')
torch.jit.save(traced, tf.name)
print(pathlib.Path(tf.name).stat().st_size)
traced._save_for_lite_interpreter(tf.name)
print(pathlib.Path(tf.name).stat().st_size)
print(tf.name)
```
3. Run `python test_lite.py`
```
import torch
from torch.jit.mobile import _load_for_lite_interpreter
# sample execution (requires torchvision)
from PIL import Image
from torchvision import transforms
input_image = Image.open('~/Documents/pytorch/data/dog.jpg')
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model
reload_lite_model = _load_for_lite_interpreter('~/Documents/pytorch/experiment/example_debug_map_with_tensorkey.ptl')
with torch.no_grad():
output_lite = reload_lite_model(input_batch)
# Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes
print(output_lite[0])
# The output has unnormalized scores. To get probabilities, you can run a softmax on it.
print(torch.nn.functional.softmax(output_lite[0], dim=0))
```
4. Compare the result with pytorch in master and pytorch built locally with this change, and see the same output.
5. The model size was 16.1 MB and becomes 12.9 with this change.
Size comparison in production models:
{F603127047}
Reviewed By: iseeyuan
Differential Revision: D27759891
fbshipit-source-id: 34e0cb8149011c46c1910165b545c137d7a0b855