Visual Data Science Lab

Abstract

This research investigates the application of vision-language models to automatically assess and rate street view images based on the Place Pulse 2.0 dataset, with a focus on comparing AI-generated ratings with human evaluations. The study introduces a context-sensitive rating system that assigns a 0-10 scale to six key urban perception categories: safety, liveliness, wealth, beauty, boredom, and depression. By comparing these AI-generated ratings with those of human volunteers, the research explores how effectively vision-language models can replicate human judgment in assessing urban environments. The findings provide valuable insights into the potential of vision-language models to scale urban perception analysis, offering an objective methodology that complements and enhances human evaluation. This approach not only contributes to urban planning by enabling more efficient, data-driven decision-making but also enriches the Place Pulse 2.0 dataset by integrating machine-generated ratings, paving the way for future advancements in urban perception studies.

@article{2025-UrbanVLM, title={Assessing Urban Environments with Vision-Language Models: A Comparative Analysis of AI-Generated Ratings and Human Volunteer Evaluations}, author={Felipe A. Moreno and Jorge Poco}, journal={IEEE International Joint Conference on Neural Networks}, year={2025}, url={https://ieeexplore.ieee.org/}, date={2025-07-05} }

Assessing Urban Environments with Vision-Language Models: A Comparative Analysis of AI-Generated Ratings and Human Volunteer Evaluations

Publication Details

Materials

Abstract

Cite this publication (BIBTEX)