Abstract: 3D visual grounding involves matching natural language descriptions with their corresponding objects in 3D spaces. Existing methods often face challenges with accuracy in object recognition ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results