auto-round
57488ab9 - refactor(ark): drop INT8 asym DPAS; add INT4/INT2 sym via INT8 DPAS

Commit

3 days ago

refactor(ark): drop INT8 asym DPAS; add INT4/INT2 sym via INT8 DPAS Roll back the INT8 asym DPAS path (perf regressed vs. dequant fallback on hardware). Add INT4-sym and INT2-sym prefill paths that upcast the packed weights into an int8_t [E, N, K] view inside the existing dequant workspace and dispatch through the same per-group INT8 DPAS mainloop the S8-sym branch uses, reusing the packed scale tensor unmodified.

References

#1813 - Add moe prefill/ decode with int2/int4/int8 sym /asym and fp8 e4m3 e5m2

Author

Copilot

Parents

152daf82

auto-round 57488ab9 - refactor(ark): drop INT8 asym DPAS; add INT4/INT2 sym via INT8 DPAS

auto-round
57488ab9 - refactor(ark): drop INT8 asym DPAS; add INT4/INT2 sym via INT8 DPAS