Mona Gandhi

Ph.D. in Computer Science @ The Ohio State University

gandhi.255 [at] osu.edu

Hi! I am a second year PhD student majoring in Computer Science at The Ohio State University, working with Prof. Srinivasan Parthasathy and Prof. Eric Fosler-Lussier. Over the summer of 2025, I interned at Adobe Research mentored by Dr. Sayan Nag.

I completed my Masters in Computer Science at University of Pennsylvania, where I was grateful to be guided by Prof. Mark Yatskar. In the past, I have also had the opportunity to work with Prof. Ranjay Krishna and Prof. Maneesh Agrawala. I did my B.Tech in Computer Science from Veermata Jijabai Technological Institute (VJTI). My current research interests lie at the intersection of vision and language, which have been the focus of my past and current research experiences.

As increasingly powerful deep learning models in vision and language emerge, fundamental questions like “What do the models actually learn?” and “What does the model base its predictions on?” still persist. My long-term research goal is to build a robust and reliable model that would be able to answer these questions. I wish to explore these two directions for it:
(i) leveraging human-like intuitions, such as learning compositionally, to boost the robustness of models and
(ii) building interpretable models such that we can use correctability methods to make sure they don't rely on biases
and thereby facilitating reliability and robustness.

Updates and Achievements

[May, 2025] Joined Adobe Research as a PhD Research Intern for Summer 2025.
[Dec, 2024] Our paper A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis won the Best Paper Award (AIM-FM 2024 Workshop).
[Sept, 2024] Our paper A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis was selected as a 🌟spotlight🌟 at NeurIPS 2024.
[Aug, 2024] Our paper A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis has been accepted to NeurIPS 2024.
[May, 2024] Graduated with a Masters Degree from UPenn.
[May, 2024] Awarded the Best Teaching Award at UPenn.
[April, 2024] Decided to go to Ohio State University for my PhD!
[Mar, 2024] Got Accepted at UPitt PhD program with the K. Leroy Irvis fellowship (not accepted).
[Mar, 2023] Our paper CREPE: Can Vision-Language Foundation Models Reason Compositionally? has been accepted to CVPR 2023 and selected as a 🌟highlight🌟!
[Mar, 2022] Our paper Measuring Compositional Consistency for Video Question Answering has been accepted to CVPR 2022.
[June, 2018] Stood 1st among girls in MHT-CET (state-level entrance exam) which is appeared by 0.28M candidates.

Publications

A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis

Yue Yang, Mona Gandhi, Yufei Wang, Yifan Wu, Michael S. Yao, Chris Callison-Burch, James C. Gee, Mark Yatskar

NeurIPS 2024 [Highlight]

Best Paper Award (AIM-FM 2024 Workshop)

[PDF] [Website] [Code]

TL;DR: We introduce Knowledge Bottlenecks (KnoBo) that incorporate priors from medical documents, such as PubMed, through inherently interpretable models. KnoBo is robust to domain shifts in medical images, e.g., data sampled from different hospitals or data confounded by demographic variables such as sex, race, etc. Overall, our work demonstrates that a key missing ingredient for robustness to distribution shift in medical imaging models is a prior rooted in knowledge.
CREPE: Can Vision-Language Foundation Models Reason Compositionally?

Zixian Ma*, Jerry Hong*, Mustafa Omer Gul*, Mona Gandhi, Irena Gao, Ranjay Krishna

CVPR 2023 [Highlight: 10% of accepted papers / 2.5% of submissions]

[PDF] [Code]
Measuring Compositional Consistency for Video Question Answering

Mona Gandhi*, Mustafa Omer Gul*, Eva Prakash, Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala

CVPR 2022

[PDF] [Website] [Code] [Data]

Projects

Examining the Reversal Curse on GPT models

[Spring 2024]

Technologies: LLMs, Deep Learning

The Reversal Curse paper (linked below) highlights a simple task that these models fail at. If the model has seen A is B, it is not guaranteed that the model can generalize B is A - this is coined as Reversal Curse in the paper. In addition to replicating the results from the paper, we investigate the model on a verification task, where the model is asked a yes-no question. The model struggles to respond to these questions and even contradicts itself within the same response.

[Report] [Presentation] [Original Paper]

AutoArt

Neural Style Transfer [Fall 2023]

Technologies: Computer Vision, Deep Learning

We proposed and implemented two novel methods to improve our outputs for neural style transfer: (i) fine-tuning the model as a classification for a particular style, and (ii) flattening the layers to allow us the convenience of adding style and content loss inside the blocks. Through network dissection, we compare the best models and positions for style and content loss. We found that mobilenetv2 with flattening with fine-tuned model gave the best visual results.

[Report] [Presentation] [Drive]

Live Poets Society

Goodreads clone [Spring 2022]

Technologies: ReactJS, MySQL

We have created a social cataloging application specifically for poetry lovers. Our aim was to make a platform that enables users to explore the world of poetry through an extensive collection of books, series, authors, and reviews. By signing in, users can create and maintain their own virtual library of poetry books, rate them, and receive custom recommendations based on their past behaviour about new poetry books that they might enjoy.

[Report] [Video] [GitHub]

Could this be any better?

Multi-modal Sarcasm Detector [Fall 2022]

Domain: Computer Vision, Natural Language Processing

We implemented a Multi-modal Sarcasm Detector using video, audio and text features from the MUStARD dataset - data from various TV sitcoms like Friends, Big Bang Theory. Training and analyzing the performance of LSTMs with different types of attention mechanisms, we found the best performing model to learn a bias towards labeling data as sarcastic, but does very well in detecting non-sarcastic data.

[Report] [Presentation] [Video]

Sentiment Analysis

For Amazon Food Reviews [Spring 2023]

Domain: Natural Language Processing

We implemented a Sentiment analysis system for Amazon Food Reviews. Text cleaning and feature extraction techniques were performed on the reviews data, namely removing special characters and numbers, removing stop words, and tokenizing the data. Other forms of text vectorizations were also examined, including Word2Vec, GLOVE and Bag-of-Words. We developed baseline models using Naive Bayes, Logistic Regression, and XGBoost, before moving to LSTM models. Finally we also evaluated BERT embeddings with LSTM.

[Report] [GitHub]

⚠ Alert!! False News! ⚠

Fake News Detector [Spring 2021]

Domain: Natural Language Processing

We developed a Fake-News Detector using Transformer-based model - BERT trained on LIAR dataset. On analysis, we infer that the model does well classifying false statements, and does a poor job classifying true statements.

[Report] [GitHub]

AlgoVisualizer

Visualizer for Algorithms [Spring 2020]

Technologies: HTML, CSS, ReactJS

Many algorithms in computer science become easier to understand if visualized. I developed a tool to visualize sorting algorithms like bubble sort, merge sort and insertion sort where every comparison the algorithm makes can be seen. There is an interactive page to see every node path finding algorithms like BFS, DFS, Dijkstra's visit where one can add weights and walls in the grid. Making comparing various algorithms against each other easier.

[Report] [GitHub]

Hostelite ⌂

Hostel Management System [Spring 2022]

Technologies: Flask, HTML, CSS, JS

Managing information about students, staff, admission process for a Hostel is tedious. We created a website which does not not require the person handling the system to be very efficient or to be good at calculations. Some key features are managing data of students, staff, students' representative, admission process, mess and also helps in maintaining exit-entry records of students who stay in the hostel, visitors and couriers delivered to them.

[GitHub]

Improving Network Intrusion Detection System using Imbalance Reduction Techniques

Senior Year Project [Fall 2021 - Spring 2022]

Domain: Network Security, Machine Learning

Given the huge amount of traffic on a network detecting malicious activities using machine learning becomes difficult as it can hide easily in normal traffic. We create an ensemble of various imbalance reduction techniques, while comparing them against each other. We infer that Depending upon the combination of techniques used for the ensemble, the results have varied. Ensembles of certain techniques did prove to show better results.

[Slides] [Report] [GitHub]

Ph.D. in Computer Science @ The Ohio State University

Updates and Achievements

Publications

A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

Measuring Compositional Consistency for Video Question Answering

Projects

Examining the Reversal Curse on GPT models

[Spring 2024]

AutoArt

Neural Style Transfer [Fall 2023]

Live Poets Society

Goodreads clone [Spring 2022]

Could this be any better?

Multi-modal Sarcasm Detector [Fall 2022]

Sentiment Analysis

For Amazon Food Reviews [Spring 2023]

⚠ Alert!! False News! ⚠

Fake News Detector [Spring 2021]

AlgoVisualizer

Visualizer for Algorithms [Spring 2020]

Hostelite ⌂

Hostel Management System [Spring 2022]

Improving Network Intrusion Detection System using Imbalance Reduction Techniques

Senior Year Project [Fall 2021 - Spring 2022]