Abstract: Image-text matching as a fundamental cross-modal understanding task presents unique challenges in weakly-aligned scenarios. Such data typically feature highly abstract textual captions with ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results