Abstract: Multimodal Chain-of-Thought (CoT) reasoning requires models to integrate visual and textual information for step-by-step inference. However, small- and medium-scale models often underutilize ...