Abstract: Automatically generating natural language descriptions of images is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In recent ...