Add AudioLDM (#2232)
* Add AudioLDM
* up
* add vocoder
* start unet
* unconditional unet
* clap, vocoder and vae
* clean-up: conversion scripts
* fix: conversion script token_type_ids
* clean-up: pipeline docstring
* tests: from SD
* clean-up: cpu offload vocoder instead of safety checker
* feat: adapt tests to audioldm
* feat: add docs
* clean-up: amend pipeline docstrings
* clean-up: make style
* clean-up: make fix-copies
* fix: add doc path to toctree
* clean-up: args for conversion script
* clean-up: paths to checkpoints
* fix: use conditional unet
* clean-up: make style
* fix: type hints for UNet
* clean-up: docstring for UNet
* clean-up: make style
* clean-up: remove duplicate in docstring
* clean-up: make style
* clean-up: make fix-copies
* clean-up: move imports to start in code snippet
* fix: pass cross_attention_dim as a list/tuple to unet
* clean-up: make fix-copies
* fix: update checkpoint path
* fix: unet cross_attention_dim in tests
* film embeddings -> class embeddings
* Apply suggestions from code review
Co-authored-by: Will Berman <wlbberman@gmail.com>
* fix: unet film embed to use existing args
* fix: unet tests to use existing args
* fix: make style
* fix: transformers import and version in init
* clean-up: make style
* Revert "clean-up: make style"
This reverts commit 5d6d1f8b324f5583e7805dc01e2c86e493660d66.
* clean-up: make style
* clean-up: use pipeline tester mixin tests where poss
* clean-up: skip attn slicing test
* fix: add torch dtype to docs
* fix: remove conversion script out of src
* fix: remove .detach from 1d waveform
* fix: reduce default num inf steps
* fix: swap height/width -> audio_length_in_s
* clean-up: make style
* fix: remove nightly tests
* fix: imports in conversion script
* clean-up: slim-down to two slow tests
* clean-up: slim-down fast tests
* fix: batch consistent tests
* clean-up: make style
* clean-up: remove vae slicing fast test
* clean-up: propagate changes to doc
* fix: increase test tol to 1e-2
* clean-up: finish docs
* clean-up: make style
* feat: vocoder / VAE compatibility check
* feat: possibly expand / cut audio waveform
* fix: pipeline call signature test
* fix: slow tests output len
* clean-up: make style
* make style
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: William Berman <WLBberman@gmail.com>