Fix glu activation (#148)
* Make sure to use glu activation when specified
* Woops forgot DS config
* Upsample ffn_hidden_size when glu is used
* Woops
* Replace assert with raising exception instead
* fix bug
Co-authored-by: Stas Bekman <stas@stason.org>