pytorch
ad91a3a1 - Skipping L2 regularization on sparse biases

Commit
4 years ago
Skipping L2 regularization on sparse biases Summary: # Motivations As explained in the [link](https://stats.stackexchange.com/questions/86991/reason-for-not-shrinking-the-bias-intercept-term-in-regression/161689#161689), regularizing biases will cause mis-calibration of predicted probabilities. In SparseNN, the unary processor may use 1d embedding tables for the sparse features to serve as biases. In this diff, the regularization term is automatically skipped for the 1d sparse parameters to avoid regularizing biases. # Experiments Experiments were conducted to verify that it has no significant impact on the NE to skip the regularization on 1d sparse parameters. Baseline.1 (no L2 regularization): f193105372 Baseline.2 (L2 regularization in prod): f193105522 Treatment (skipping L2 regularization on 1d sparse params): f193105708 {F239859690} Test Plan: Experiments were conducted to verify that it has no significant impact on the NE to skip the regularization on 1d sparse parameters using a canary package: `aml.dper2.canary:9efc576b35b24361bb600dcbf94d31ea`. Baseline.1 (no L2 regularization): f193105372 Baseline.2 (L2 regularization in prod): f193105522 Treatment (skipping L2 regularization on 1d sparse params): f193105708 Reviewed By: zhongyx12 Differential Revision: D21757902 fbshipit-source-id: ced126e1eab270669b9981c9ecc287dfc9dee995
Author
Taiqing Wang
Parents
Loading