Abstract: Generating images that align with textual input using text-to-image (TTI) generation models is a challenging task. Generative adversarial network (GAN) based TTI models can produce realistic ...