Fix LTX-2 image-to-video generation failure in two stages generation (#13187)
* Fix LTX-2 image-to-video generation failure in two stages generation
In LTX-2's two-stage image-to-video generation task, specifically after
the upsampling step, a shape mismatch occurs between the `latents` and
the `conditioning_mask`, which causes an error in function
`_create_noised_state`.
Fix it by creating the `conditioning_mask` based on the shape of the
`latents`.
* Add unit test for LTX-2 i2v two stages inference with upsampler
* Downscaling the upsampler in LTX-2 image-to-video unit test
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>