Fix deserialization of TransformerEncoderLayer (#81832)
Summary: When `activation` is a module, it is not saved directly in the state dictionary but instead in `_modules`. When deserialized, the old version of this code would think that activation was missing and set it to RELU. This version first reconstructions the module and then sees if activation is neither a module nor a function before setting it to RELU.
Test Plan: pytorch oss tests
Reviewed By: zrphercule
Differential Revision: D38014872
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81832
Approved by: https://github.com/kit1980, https://github.com/zrphercule