The rapid progress of large language models (LLMs) has catalyzed the emergence of multimodal large language models (MLLMs) that unify visual understanding and image generation within a single ...
Abstract: Enabling robots to perform everyday tasks has become increasingly important. Task planning, which decomposes task instructions into executable action sequences, is crucial for equipping ...