Abstract: Generating images that align with textual input using text-to-image (TTI) generation models is a challenging task. Generative adversarial network (GAN) based TTI models can produce realistic ...
Manzano combines visual understanding and text-to-image generation, while significantly reducing performance or quality trade-offs.