Abstract: Generating images that align with textual input using text-to-image (TTI) generation models is a challenging task. Generative adversarial network (GAN) based TTI models can produce realistic ...
Manzano combines visual understanding and text-to-image generation, while significantly reducing performance or quality trade-offs.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results