`AttentionProcessor.group_norm` num_channels should be `query_dim` (#3046)
* `AttentionProcessor.group_norm` num_channels should be `query_dim`
The group_norm on the attention processor should really norm the number
of channels in the query _not_ the inner dim. This wasn't caught before
because the group_norm is only used by the added kv attention processors
and the added kv attention processors are only used by the karlo models
which are configured such that the inner dim is the same as the query
dim.
* add_{k,v}_proj should be projecting to inner_dim