Abstract: Multimodal Chain-of-Thought (CoT) reasoning requires models to integrate visual and textual information for step-by-step inference. However, small- and medium-scale models often underutilize ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results