Abhirama Subramanyam
Email  / 
LinkedIn  / 
Google Scholar  / 
GitHub  / 
Twitter / 
CV
I'm a PMRF Ph.D. scholar in the Vision, Language and Learning Group (VL2G) led
by Dr.Anand Mishra at IIT Jodhpur. My research focuses on developing open-source retrieval augmented
generation (RAG) enabled large multimodal models (LMMs), specifically for knowledge-intensive question-
answering tasks over multimodal data. These tasks include variants of visual question answering (VQA)
and audio question answering (AQA), which require external knowledge reasoning.
|
|
News
[For more news, scroll down]
|
Audiopedia: Audio QA with Knowledge
Abhirama Subramanyam Penamakuri*,
Kiran Chhatre*,
Akshat Jain. (*Equal contribution)
ICASSP, 2025   (Oral Presentation)
project page
/
arXiv
/
data
Audiopedia is introduced (with 3 subtasks, s-AQA, m-AQA and r-AQA), a novel Audio QA task, requiring audio comprehension and external knowledge reasoning. Additionally, a framework that combines Audio Entity Linking (AEL) and a Knowledge-Augmented Audio Multimodal Model (KA2LM) is proposed to enhance large audio language models for knowledge-intensive tasks.
|
|
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
Abhirama Subramanyam Penamakuri,
Anand Mishra
EMNLP (Main) , 2024
project page
/
arXiv
/
code
/
poster
/
slides
/
short talk
Text-KVQA is revisited with advancements in large multimodal models, introducing VisTEL, a method for visual text entity linking that leverages visual and textual cues. Additionally, KaLMA, a knowledge-aware assistant, is proposed to enhance LMMs by incorporating knowledge related to the visual text entity for improved accuracy.
|
|
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering
Abhirama Subramanyam Penamakuri,
Manish Gupta,
Mithun Das Gupta,
Anand Mishra
IJCAI (Main Track), 2023   (Oral Presentation)
project page
/
arXiv
/
code
/
data
/
slides
/
short talk
RetVQA is introduced as a more challenging extension of traditional VQA, where a model retrieves relevant images from a pool to answer questions. The proposed MI-BART model, along with the new RETVQA dataset, achieves significant improvements in both accuracy and fluency over existing methods.
|
|
COFAR: Commonsense and Factual Reasoning in Image Search
Prajwal Gatti,
Abhirama Subramanyam Penamakuri,
Revant Teotia,
Anand Mishra,
Shubhashis Sengupta,
Roshni Ramnani,
AACL-IJCNLP, 2022
project page
/
arXiv
/
code
/
data
/
slides
/
short talk
The COFAR dataset is introduced to evaluate image search involving commonsense and factual reasoning. To address this, KRAMT is proposed, integrating visual entities with encyclopedic knowledge and natural language queries for more accurate image retrieval. |
|
Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification
Nakul Sharma,
Abhirama Subramanyam Penamakuri,
Anand Mishra
ICVGIP, 2022
project page /
arXiv /
paper /
code /
data
Business logo identification in natural scenes using an open-set one-shot framework with multi-view textual-visual encoding, outperforming state-of-the-art techniques. The Wikidata Reference Logo Dataset (WiRLD) of 100,000 brand logos is introduced to study one-shot identification at scale.
|
|
System and method for intelligent recruitement management
Subramanian Viswanathan, Janakiraman Pradeep, Inbasekaran Bharath Kumar, Roy Subhadeep, Ragavan Shankarri, S Madhuvani, Abhirama Subramanyam Penamakuri, Sirisha Kona.
US Patent (Granted), 2021
The invention presents an intelligent recruitment management system that automates the recruitment process through a recruitment intelligence platform, utilizing modules for requisition parsing, resume analysis, candidate submissions, and job matching. This platform allows recruiters to track all steps of the recruitment process efficiently. |
|