News
[For more news, scroll down]
|
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
Abhirama S. Penamakuri,
Anand Mishra
EMNLP (Main) , 2024
project page
/
arXiv
/
code
/
poster
/
slides
/
short talk
Text-KVQA is revisited with advancements in large multimodal models, introducing VisTEL, a method for visual text entity linking that leverages visual and textual cues. Additionally, KaLMA, a knowledge-aware assistant, is proposed to enhance LMMs by incorporating knowledge related to the visual text entity for improved accuracy.
|
|
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering
Abhirama S. Penamakuri,
Manish Gupta,
Mithun Das Gupta,
Anand Mishra
IJCAI (Main Track), 2023   (Oral Presentation)
project page
/
arXiv
/
code
/
data
/
slides
/
short talk
RetVQA is introduced as a more challenging extension of traditional VQA, where a model retrieves relevant images from a pool to answer questions. The proposed MI-BART model, along with the new RETVQA dataset, achieves significant improvements in both accuracy and fluency over existing methods.
|
|
COFAR: Commonsense and Factual Reasoning in Image Search
Prajwal Gatti,
Abhirama S. Penamakuri,
Revant Teotia,
Anand Mishra,
Shubhashis Sengupta,
Roshni Ramnani,
AACL-IJCNLP, 2022
project page
/
arXiv
/
code
/
data
/
slides
/
short talk
The COFAR dataset is introduced to evaluate image search involving commonsense and factual reasoning. To address this, KRAMT is proposed, integrating visual entities with encyclopedic knowledge and natural language queries for more accurate image retrieval. |
|
Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification
Nakul Sharma,
Abhirama S. Penamakuri,
Anand Mishra
ICVGIP, 2022
project page /
arXiv /
paper /
code /
data
Business logo identification in natural scenes using an open-set one-shot framework with multi-view textual-visual encoding, outperforming state-of-the-art techniques. The Wikidata Reference Logo Dataset (WiRLD) of 100,000 brand logos is introduced to study one-shot identification at scale.
|
|
System and method for intelligent recruitement management
Subramanian Viswanathan, Janakiraman Pradeep, Inbasekaran Bharath Kumar, Roy Subhadeep, Ragavan Shankarri, S Madhuvani, Abhirama S. Penamakuri, Sirisha Kona.
US Patent (Granted), 2021
The invention presents an intelligent recruitment management system that automates the recruitment process through a recruitment intelligence platform, utilizing modules for requisition parsing, resume analysis, candidate submissions, and job matching. This platform allows recruiters to track all steps of the recruitment process efficiently. |
|