Reduce compile time for generic matmatmul (#52038)
This is another attempt at improving the compile time issue with generic
matmatmul, hopefully improving runtime performance also.
@chriselrod @jishnub
There seems to be a little typo/oversight somewhere, but it shows how it
could work. Locally, this reduces benchmark times from
https://github.com/JuliaLang/julia/pull/51812#issuecomment-1780394475 by
more than 50%.
---------
Co-authored-by: Chris Elrod <elrodc@gmail.com>