Research
My interests revolve around the convergence of natural language processing and
computer vision, with a focus on gaining insights from human cognition.
I am enthusiastic about exploring language grounding within multimodal contexts
and investigating the linguistic and cognitive characteristics of models.
|
Publications & Preprints
* denotes equal
contribution
|
Do Vision and Language Models Share Concepts? A Vector Space
Alignment Study
Jiaang Li,
Yova
Kementchedjhieva,
Constanza Fierro,
Anders Søgaard
TACL
code & data
TL;DR
Our experiments show that LMs partially converge towards representations
isomorphic to those of vision models,
subject to dispersion, polysemy, and frequency, which has important implications
for both multi-modal processing and the LM understanding debate.
|
|
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of
Chinese Food Culture
Wenyan Li,
Xinyu Zhang,
Jiaang Li,
Qiwei Peng,
Raphael Tang,
Li Zhou,
Weijia Zhang,
Guimin Hu,
Yifei Yuan,
Anders Søgaard,
Daniel Hershcovich,
Desmond Elliottd
EMNLP 2024
code  / 
data
TL;DR
In this work, we introduce FoodieQA, a manually curated, fine-grained image-text
dataset capturing the intricate features of
food cultures across various regions in China, and evaluates vision-language
Models (VLMs) and large language models (LLMs)
on newly collected, unseen food images and corresponding questions.
|
|
Understanding Retrieval Robustness for Retrieval-Augmented Image
Captioning
Wenyan Li,
Jiaang Li,
Rita
Ramos,
Raphael Tang,
Desmond Elliottd
ACL 2024
code
TL;DR
We analyze the robustness of a retrieval-augmented captioning model SmallCap and
propose to train the model by sampling retrieved
captions from more diverse sets, which decreases the chance that the model
learns to copy majority tokens, and improves both
in-domain and cross-domain performance.
|
|
Exploring Visual Culture Awareness in GPT-4V: A Comprehensive
Probing
Yong Cao,
Wenyan Li,
Jiaang Li,
Yifei Yuan,
Daniel Hershcovich
Preprint 2024
TL;DR
We empirically show that GPT-4V excels at identifying cultural concepts but
still exhibits weaker performance
in low-resource languages, such as Tamil and Swahili, suggesting a promising
solution for future visual cultural benchmark construction.
|
|
Structural Similarities Between Language Models and Neural
Response Measurements
Jiaang Li*,
Antonia Karamolegkou*,
Yova
Kementchedjhieva,
Mostafa
Abdou,
Sune
Lehmann,
Anders Søgaard
NeurReps @ NeurIPS 2023
code
TL;DR
This work shows that the larger neural language models get, the more their
representations are structurally similar to
neural response measurements from brain imaging.
|
|
Copyright Violations and Large Language Models
Antonia Karamolegkou*,
Jiaang Li*,
Li Zhou,
Anders Søgaard
EMNLP 2023
code
TL;DR
We explore the issue of copyright violations and large language models through
the lens of verbatim memorization,
focusing on possible redistribution of copyrighted text.
|
|
PokemonChat: Auditing ChatGPT for Pokemon Universe Knowledge
Laura Cabello,
Jiaang Li,
Ilias Chalkidis
Preprint 2023
TL;DR
We probe ChatGPT for its conversational understanding and introduces a
conversational framework (protocol)
that can be adopted in future studies to assess ChatGPT's ability to generalize,
combine features, and to acquire
and reason over newly introduced knowledge from human feedback.
|
Services
- Reviewer: ACL 2024, NLLP workshop' 2023
|
|