Abstract: Document Information Extraction aims to extract entities and relationships from visually rich documents. Traditional methods require significant annotation and lack generality. In this paper ...
Abstract: Knowledge-based Visual Question Answering (VQA) is a challenging task that requires models to access external knowledge for reasoning. Large Language Models (LLMs) have recently been ...
Early in the Covid-19 pandemic, the governor of New Jersey made an unusual admission: He’d run out of COBOL developers. The state’s unemployment insurance systems were written in the 60-year-old ...
In vision-language models (VLMs), visual tokens usually consume a significant amount of computational overhead, despite their sparser information density compared to text tokens. To address this, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results