Abstract: This article concentrates on open-vocabulary semantic segmentation, where a well optimized model is able to segment arbitrary categories that appear in an image. To achieve this goal, we ...