Avoid the misleading zero_point and scale [2/2] (#28827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28827
When we print the `DynamicLinear` module, we don't want to print the scale and zero points as they are not needed for the dynamic quantization.
Let's take the output of RoBERTa model as an example:
Before this PR:
```
(19): TransformerEncoderLayer(
(dropout): Dropout(p=0.1, inplace=False)
(attention): MultiheadAttention(
(dropout): Dropout(p=0.1, inplace=False)
(input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0)
(output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0)
)
(residual_mlp): ResidualMLP(
(mlp): Sequential(
(0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0)
(1): GeLU()
(2): Dropout(p=0.1, inplace=False)
(3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0)
(4): Dropout(p=0.1, inplace=False)
)
)
(attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(20): TransformerEncoderLayer(
(dropout): Dropout(p=0.1, inplace=False)
(attention): MultiheadAttention(
(dropout): Dropout(p=0.1, inplace=False)
(input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0)
(output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0)
)
(residual_mlp): ResidualMLP(
(mlp): Sequential(
(0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0)
(1): GeLU()
(2): Dropout(p=0.1, inplace=False)
(3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0)
(4): Dropout(p=0.1, inplace=False)
)
)
(attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
```
After this PR:
```
(19): TransformerEncoderLayer(
(dropout): Dropout(p=0.1, inplace=False)
(attention): MultiheadAttention(
(dropout): Dropout(p=0.1, inplace=False)
(input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072)
(output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024)
)
(residual_mlp): ResidualMLP(
(mlp): Sequential(
(0): DynamicQuantizedLinear(in_features=1024, out_features=4096)
(1): GeLU()
(2): Dropout(p=0.1, inplace=False)
(3): DynamicQuantizedLinear(in_features=4096, out_features=1024)
(4): Dropout(p=0.1, inplace=False)
)
)
(attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(20): TransformerEncoderLayer(
(dropout): Dropout(p=0.1, inplace=False)
(attention): MultiheadAttention(
(dropout): Dropout(p=0.1, inplace=False)
(input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072)
(output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024)
)
(residual_mlp): ResidualMLP(
(mlp): Sequential(
(0): DynamicQuantizedLinear(in_features=1024, out_features=4096)
(1): GeLU()
(2): Dropout(p=0.1, inplace=False)
(3): DynamicQuantizedLinear(in_features=4096, out_features=1024)
(4): Dropout(p=0.1, inplace=False)
)
)
(attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
```
ghstack-source-id: 92807317
Test Plan: CI
Differential Revision: D18197022
fbshipit-source-id: e41635330cfdfb008a0468d6a8ff67a06f7e1c59