thanks!
I'm aware there is some discrepancy between the the rotary and sincos embedding, but moving it to pipeline will create more discrepancy from rest of the code base so I think we can either leave it as it is for now, or maybe consider adding a positional embedding in transformer for rotary embedding too (and we can apply same patten across all relevant pipeline in a follow-up PR)
let me know what you think!
I left a comment!
thanks for the PR!
Login to write a write a comment.
What does this PR do?
removes the 49-frame limit since CogVideoX-5B generalizes better than 2B and is able to generate more framesmoves the positional embedding creation logic to the pipeline similar to rotary embeddingsmove the positional embedding logic to patch embed layer
as a side-effect of this PR, one can generate > 49 frames with CogVideoX-2b which will produce bad results, but we can add a recommendation about this.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@yiyixuxu