Abhirama Subramanyam

Email  /  LinkedIn  /  Google Scholar  /  GitHub  /  Twitter /  CV

I'm a PMRF Ph.D. scholar in the Vision, Language and Learning Group (VL2G) led by Dr.Anand Mishra at IIT Jodhpur. I'm a deep learning enthusiast working on solving interesting problems in the intersection of computer vision, natural language processing and knowledge graphs. Precisely, I work on multimodal deep learning.

Avatar

News

[For more news, scroll down]

Research

PontTuset Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
Abhirama S. Penamakuri, Anand Mishra
EMNLP (Main) , 2024
project page / arXiv / code / poster / slides / short talk

Text-KVQA is revisited with advancements in large multimodal models, introducing VisTEL, a method for visual text entity linking that leverages visual and textual cues. Additionally, KaLMA, a knowledge-aware assistant, is proposed to enhance LMMs by incorporating knowledge related to the visual text entity for improved accuracy.

PontTuset Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering
Abhirama S. Penamakuri, Manish Gupta, Mithun Das Gupta, Anand Mishra
IJCAI (Main Track), 2023   (Oral Presentation)
project page / arXiv / code / data / slides / short talk

RetVQA is introduced as a more challenging extension of traditional VQA, where a model retrieves relevant images from a pool to answer questions. The proposed MI-BART model, along with the new RETVQA dataset, achieves significant improvements in both accuracy and fluency over existing methods.

PontTuset COFAR: Commonsense and Factual Reasoning in Image Search
Prajwal Gatti, Abhirama S. Penamakuri, Revant Teotia, Anand Mishra, Shubhashis Sengupta,
Roshni Ramnani,
AACL-IJCNLP, 2022
project page / arXiv / code / data / slides / short talk

The COFAR dataset is introduced to evaluate image search involving commonsense and factual reasoning. To address this, KRAMT is proposed, integrating visual entities with encyclopedic knowledge and natural language queries for more accurate image retrieval.

PontTuset Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification
Nakul Sharma, Abhirama S. Penamakuri, Anand Mishra
ICVGIP, 2022
project page / arXiv / paper / code / data

Business logo identification in natural scenes using an open-set one-shot framework with multi-view textual-visual encoding, outperforming state-of-the-art techniques. The Wikidata Reference Logo Dataset (WiRLD) of 100,000 brand logos is introduced to study one-shot identification at scale.

PontTuset System and method for intelligent recruitement management
Subramanian Viswanathan, Janakiraman Pradeep, Inbasekaran Bharath Kumar, Roy Subhadeep, Ragavan Shankarri, S Madhuvani, Abhirama S. Penamakuri, Sirisha Kona.
US Patent (Granted), 2021

The invention presents an intelligent recruitment management system that automates the recruitment process through a recruitment intelligence platform, utilizing modules for requisition parsing, resume analysis, candidate submissions, and job matching. This platform allows recruiters to track all steps of the recruitment process efficiently.


Template Source: John Barron