Interests
Blogs
A running list of topics I’m currently exploring and thinking about.
TAGS:
Where's the Chicken? Unpacking Spatial Awareness in Vision-Language Models
Modern vision-language models (VLMs) have achieved impressive success in recognizing and describing visual content, yet they continue to struggle with understanding spatial relationships. The limitation persists even with massive data and model scaling, suggesting that the root of the problem lies in the architecture and training objective rather than data alone. This post examines the underlying causes and discusses why recent proposed fixes, while promising, remain insufficient to achieve robust spatial reasoning.
April 27, 2026
Vision Language Models Spatial Reasoning
View Blog Post
[In Progress] SAEs on VLMs
April 27, 2026
Vision Language Models Sparse Autoencoders
View Blog Post