[loading] Really initialize on meta device for huge perf gains (#42941)
* use meta device directly
* style
* move back non-persistent
* fix
* make helper
* fix it
* use native param dtype
* make tensors buffers
* style
* fix
* oupsi
* add a test and fix
* fix
* create timm integration to reinit non-persistemnt buffers....
* style
* style
* more
* better
* add doc
* more timm stuff
* more
* fix
* small change
* no actually it was fine before