Interests

Blogs

A running list of topics I’m currently exploring and thinking about.

TAGS:

Where's the Chicken? Unpacking Spatial Awareness in Vision-Language Models

Modern vision-language models (VLMs) have achieved impressive success in recognizing and describing visual content, yet they continue to struggle with understanding spatial relationships. The limitation persists even with massive data and model scaling, suggesting that the root of the problem lies in the architecture and training objective rather than data alone. This post examines the underlying causes and discusses why recent proposed fixes, while promising, remain insufficient to achieve robust spatial reasoning.

April 27, 2026

Vision Language Models Spatial Reasoning

View Blog Post