Abstract: We present ForceSight, a system for text-guided mobile manipulation that predicts visual-force goals using a text-conditioned vision transformer. Given a single RGBD image and a text prompt, ...
- checkpoints/ - audio-cond_animation/ - avsync15_audio-cond_cfg/ - landscapes_audio-cond_cfg/ - thegreatesthits_audio-cond_cfg/ - avsync/ - vggss_sync_contrast ...