The assessment, which it conducted in December 2025, compared five of the best-known vibe coding tools — Claude Code, OpenAI Codex, Cursor, Replit, and Devin — by using pre-defined prompts to build ...
A maze navigation task generator for training and evaluating video generation models on spatial reasoning tasks. Based on the template-data-generator framework and adapted from VMEvalKit's maze ...
Advanced video models have recently demonstrated remarkable zero-shot capabilities of visual reasoning, solving tasks like maze, symmetry, and analogy completion through a chain-of-frames (CoF) ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results