Sequence Blob NVM Reader to Selectively NVMify Ads Embeddings in A*
Summary:
This diff enabled mapping a selected set of Ads embeddings to the T17 host on hierarchical memory (nvmify). To achieve that the following is implemented:
- Allow fo OTHER net to be both onnxified and nvmified
- For that an allowlist placement policy is added to the nvmify stack
- onnxifi_transform is lightly updated to accept a blacklist of operators based on name
- nvm transform is broken into two parts, op replacement, and blob update.
- A drived class `SeqBlobNVMReader` is defined which adds the functionality to load blobs to the card or nvm.
Test Plan:
* Unit test
* Run predictor replayer: selectively load the following ads embedding to NVM as in `--caffe2_nvm_dram_placement_file=/home/hanli/nvm_allowlist`:
```
SPARSE_AD_ACCOUNT_ID
SPARSE_NEW_AD_ID_COARSE
SPARSE_NEW_AD_ID_REFINED
SPARSE_NEW_CAMPAIGN_ID
SPARSE_NEW_TARGET_ID
SPARSE_NEW_AD_CLUSTER_ID
SPARSE_NEW_PAGE_ID
SPARSE_NEW_STORY_ID
SPARSE_NEW_VIDEO_ID
SPARSE_ENTITY_EQUIVALENCE_KEY
SPARSE_ENTITY_EQUIVALENCE_KEY_NO_CREATIVE
```
major parameter change in sigrid_remote_predictor_glow_nnpi:
```
--caffe2_nets_to_nvmify=DISAGG_ACC_REMOTE_OTHER \
--caffe2_nvm_sls_ops=SparseLengthsSumFused8BitRowwise,SparseLengthsWeightedSumFused8BitRowwise,SparseLengthsSumFused4BitRowwise,SparseLengthsWeightedSumFused4BitRowwise,SparseLengthsSum4BitRowwiseSparse \
--caffe2_nvm_table_path=/home/hanli/tables/225412100_2870/ \
--caffe2_nvm_dram_placement_file=/home/hanli/nvm_allowlist \
--caffe2_nvm_dram_placement_policy=by_file_allowlist \
--caffe2_predictor_nets_to_load=DISAGG_ACC_REMOTE_OTHER
```
In predictor log, observe that the blobs to be NVMified are transformed in op types, skipped in Onnxifi transform, and deferred loaded and do NVM net transform:
```
I0416 09:59:29.550690 662344 Nvmifier.cpp:142] ^[[92mReplacing SparseLengthsSumFused4BitRowwise with NVM variant.^[[0m
I0416 09:59:29.550701 662344 Nvmifier.cpp:142] ^[[92mReplacing SparseLengthsSumFused4BitRowwise with NVM variant.^[[0m
I0416 09:59:29.550705 662344 Nvmifier.cpp:142] ^[[92mReplacing SparseLengthsSumFused4BitRowwise with NVM variant.^[[0m
I0416 09:59:29.550712 662344 Nvmifier.cpp:142] ^[[92mReplacing SparseLengthsSumFused4BitRowwise with NVM variant.^[[0m
I0416 09:59:29.550715 662344 Nvmifier.cpp:142] ^[[92mReplacing SparseLengthsSumFused4BitRowwise with NVM variant.^[[0m
I0416 09:59:29.550721 662344 Nvmifier.cpp:142] ^[[92mReplacing SparseLengthsSumFused4BitRowwise with NVM variant.^[[0m
...
I0416 09:59:31.665369 662344 onnxifi_transformer.cc:1097] Skipping blocklisted op SparseLengthsSumFused4BitRowwiseNVM at pos 770
I0416 09:59:31.667042 662344 onnxifi_transformer.cc:1097] Skipping blocklisted op SparseLengthsSumFused4BitRowwiseNVM at pos 777
I0416 09:59:31.667294 662344 onnxifi_transformer.cc:1097] Skipping blocklisted op SparseLengthsSumFused4BitRowwiseNVM at pos 779
I0416 09:59:31.668828 662344 onnxifi_transformer.cc:1097] Skipping blocklisted op SparseLengthsSumFused4BitRowwiseNVM at pos 786
I0416 09:59:31.668843 662344 onnxifi_transformer.cc:1097] Skipping blocklisted op SparseLengthsSumFused4BitRowwiseNVM at pos 787
I0416 09:59:31.669909 662344 onnxifi_transformer.cc:1097] Skipping blocklisted op SparseLengthsSumFused4BitRowwiseNVM at pos 792
...
I0416 10:01:09.087282 662344 Nvmifier.cpp:346] found the name: table0
I0416 10:01:09.373975 662344 Nvmifier.cpp:374] ^[[96mSaved /home/hanli/tables/225412100_2870/table0^[[0m
I0416 10:01:09.376008 662344 Nvmifier.cpp:343] filename: sparse_nn_sparse_arch_SPARSE_NEW_AD_ID_COARSE_dedicated_13_w_EmbeddingFusedUint4Quantization
..
I0416 10:11:05.310854 662344 Nvmifier.cpp:161] ^[[95mNVMifying the model.^[[0m
I0416 10:11:05.310887 662344 Nvmifier.cpp:185] found the name: table0 for sparse_nn_sparse_arch_SPARSE_NEW_AD_ID_COARSE_dedicated_13_w_EmbeddingFusedUint4Quantization
I0416 10:11:07.580587 662344 Nvmifier.cpp:185] found the name: table4 for sparse_nn_sparse_arch_SPARSE_AD_ACCOUNT_ID_dedicated_20_w_EmbeddingFusedUint4Quantization
I0416 10:11:07.580648 662344 Nvmifier.cpp:185] found the name: table3 for sparse_nn_sparse_arch_SPARSE_ENTITY_EQUIVALENCE_KEY_dedicated_22_w_EmbeddingFusedUint4Quantization
I0416 10:11:07.580667 662344 Nvmifier.cpp:185] found the name: table5 for sparse_nn_sparse_arch_SPARSE_NEW_TARGET_ID_dedicated_29_w_EmbeddingFusedUint4Quantization
I0416 10:11:07.580682 662344 Nvmifier.cpp:185] found the name: table2 for sparse_nn_sparse_arch_SPARSE_NEW_AD_ID_REFINED_dedicated_30_w_EmbeddingFusedUint4Quantization
I0416 10:11:07.580695 662344 Nvmifier.cpp:185] found the name: table1 for sparse_nn_sparse_arch_SPARSE_NEW_STORY_ID_dedicated_35_w_EmbeddingFusedUint4Quantization
```
Make sure model is properly loaded:
```
I0415 21:42:48.400249 873685 ModelManagerBase.cpp:806] Loaded 225412100_2870 in 730944 ms (63800 ms of IO) memory used 8744167456 byte(s)
```
* Only load user embedding to NVM to make sure baseline use case is not broken by this diff:
```
--caffe2_nets_to_nvmify=DISAGG_ACC_REMOTE_REQUEST_ONLY \
--caffe2_nvm_sls_ops=SparseLengthsSumFused8BitRowwise,SparseLengthsWeightedSumFused8BitRowwise,SparseLengthsSumFused4BitRowwise,SparseLengthsWeightedSumFused4BitRowwise,SparseLengthsSum4BitRowwiseSparse \
--caffe2_nvm_table_path=/home/hanli/tables/225412100_2870/
```
Make sure model is loaded:
```
Loaded 225412100_2870 in 381139 ms (56313 ms of IO) memory used 7043933560 byte(s)
```
* Run feed replayer: `buck-out/gen/sigrid/feed/prediction_replayer/fully_remote_replayer_main --use_new_encoding_for_ads_services --use_new_encoding_from_model_id_to_shard_id --request_file_path /data/users/hanli/f266405843.requests --model_id=265540157_0 --replayer_thread_count=30 --sigrid_predictor_single_host=2401:db00:272c:602e:face:0:10:0 --sigrid_predictor_single_port=7444 --num_iterations=5 --qps=100 --client_name=predictor_v1` (load predictor as in P411172400)
Output:
```
I0428 21:20:25.106635 1396182 FullyRemoteReplayer.cpp:107] Loading requests from /data/users/hanli/f266405843.requests
I0428 21:20:25.547982 1396182 FullyRemoteReplayer.cpp:109] Requests size : 6699
I0428 21:20:25.548146 1396182 Client.cpp:274] V1 tier name: V2 tier name: sigrid.predictor.fully_remote_test V2 fully remote tier name:
I0428 21:20:25.548153 1396182 Client.cpp:282] [MF] Migration Framework (traffic routing) enabled: false
I0428 21:20:25.548172 1396182 ModelRemoteStatus.cpp:206] Selection probabilities znode path: /configerator-gz/.prn
I0428 21:20:25.674162 1396265 ModelRemoteStatus.cpp:612] Found 0 host, 0 shards in predictor tier
I0428 21:20:25.674181 1396182 ModelRemoteStatus.cpp:557] Refresh sigrid model succeeded: 1
I0428 21:21:26.252820 1396265 ModelRemoteStatus.cpp:612] Found 0 host, 0 shards in predictor tier
I0428 21:21:26.252851 1396265 ModelRemoteStatus.cpp:557] Refresh sigrid model succeeded: 1
I0428 21:22:22.225976 1396182 PredictionReplayer.cpp:67] Previous request took too long, not reaching target QPS
I0428 21:22:26.252643 1396265 ModelRemoteStatus.cpp:612] Found 0 host, 0 shards in predictor tier
I0428 21:22:26.252678 1396265 ModelRemoteStatus.cpp:557] Refresh sigrid model succeeded: 1
I0428 21:23:26.252959 1396265 ModelRemoteStatus.cpp:612] Found 0 host, 0 shards in predictor tier
I0428 21:23:26.252987 1396265 ModelRemoteStatus.cpp:557] Refresh sigrid model succeeded: 1
I0428 21:24:26.253135 1396265 ModelRemoteStatus.cpp:612] Found 0 host, 0 shards in predictor tier
I0428 21:24:26.253166 1396265 ModelRemoteStatus.cpp:557] Refresh sigrid model succeeded: 1
I0428 21:25:27.252734 1396265 ModelRemoteStatus.cpp:612] Found 0 host, 0 shards in predictor tier
I0428 21:25:27.252763 1396265 ModelRemoteStatus.cpp:557] Refresh sigrid model succeeded: 1
I0428 21:26:03.172894 1396182 FullyRemoteReplayer.cpp:59] cpu time p25, p50, p75, p95, p99 9570 13011 16218 20788 24840
I0428 21:26:03.172927 1396182 FullyRemoteReplayer.cpp:61] wait time p25, p50, p75, p95, p99 11845 15958 19946 26579 31842
I0428 21:26:03.172940 1396182 FullyRemoteReplayer.cpp:63] wall time p25, p50, p75, p95, p99 16194 20888 25303 31692 37387
```
Reviewed By: ehsanardestani
Differential Revision: D27701121
fbshipit-source-id: e898abc6957c839e402a9763172cf85d9bb84cbd