
Where urban life meets embodied intelligence
Tsinghua University EmbodiedCity Team
Latest Projects
Check out our creative works.
-
Benchmark
Embodied City
We release a new benchmark platform, named embodied-city, for embodied intelligence in urban environments.
See Project -
Benchmark
ACL 2025UrbanVideo-Bench
We introduce a benchmark to evaluate whether video-large language models (Video-LLMs) can naturally process continuous first-person visual observations like humans, enabling recall, perception, reasoning, and navigation.
See Project -
Benchmark
MM 2025Open3DVQA
Open3DVQA is a benchamrk to comprehensively evaluate the spatial reasoning capacities of current SOTA foundation models in open 3D space. It consists of 9k VOA samples, collected using aneffcient semi-automated tool in a high-fdelity urban simulator.
See Project -
Task
EMNLP 2025CityEQA
We introduce CityEQA, a new task where an embodied agent answers open-vocabulary questions through active exploration in dynamic city spaces.
See Project -
Framework
MM 2025Embodied-R
A comprehensive framework for embodied reasoning in urban environments, enabling agents to perform complex spatial reasoning and decision-making tasks through multi-modal perception and interaction.
See Project -
World model
MM 2025AirScape
We introduce AirScape, an aerial generative world model with motion controllability that enables 6DoF aerial agents to predict future observations based on egocentric views and natural language motion intentions.
See Project -
Agent
ACL 2025CityNavAgent
We propose CityNavAgent, an LLM-empowered agent for aerial vision-and-language navigation in urban environments. It leverages a hierarchical semantic planner and a global memory module to significantly reduce navigation complexity and enhance long-term stability.
See Project -
Survey
IJCAI 20253D-LLM-Survey
We present a comprehensive survey that reviews how Large Language Models (LLMs) are being enabled with 3D spatial reasoning. This work introduces a structured taxonomy to categorize current research, offering a systematic overview of image-based, point cloud-based, and hybrid methods that bridge language and 3D data.
See Project
Workshop
-
Conference
ICLR 2025 Workshop
This workshop is motivated by a fact: human beings have strong embodied intelligence in an open environment, but it is still challenging for large language models and LLM agents. Depsite some progresses on embodied AI on static and indoor environment, the LLM agents are still struggling in tasks in large-scale outdoor environment, such as navigation, search, spatial reasoning, task planning, etc. Therefore, we propose this workshop to discuss the recent advances on the related research area and looking forward to the future development. Specifically, it delves into topics of outdoor embodied intelligence, such as spatial intelligence and embodied perception, reasoning and planning, decision-making and action, multi-agent and human-agent collaboration, and the development of simulators, testbeds, datasets, and benchmarks. This comprehensive exploration of embodied LLM agents in open city environment holds the potential to advance the field of artificial intelligence and open up new applications in various domains.We also have a special poster/short paper session for those solutions that perform best in the Open Urban Environment Embodied Intelligence Competition. See Project