Abstract: Video-text cross-modal retrieval (VTR) is more natural and challenging than image-text retrieval, which has attracted increasing interest from researchers in recent years. To align VTR more ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results