Where urban life meets embodied intelligence

Tsinghua University EmbodiedCity Team

Latest Projects

Check out our creative works.

See All Projects

Benchmark

Embodied City

We release a new benchmark platform, named embodied-city, for embodied intelligence in urban environments.
See Project
Benchmark

ACL 2025

UrbanVideo-Bench

We introduce a benchmark to evaluate whether video-large language models (Video-LLMs) can naturally process continuous first-person visual observations like humans, enabling recall, perception, reasoning, and navigation.
See Project
Benchmark

MM 2025

Open3DVQA

Open3DVQA is a benchamrk to comprehensively evaluate the spatial reasoning capacities of current SOTA foundation models in open 3D space. It consists of 9k VOA samples, collected using aneffcient semi-automated tool in a high-fdelity urban simulator.
See Project
Task

EMNLP 2025

CityEQA

We introduce CityEQA, a new task where an embodied agent answers open-vocabulary questions through active exploration in dynamic city spaces.
See Project
Framework

MM 2025

Embodied-R

A comprehensive framework for embodied reasoning in urban environments, enabling agents to perform complex spatial reasoning and decision-making tasks through multi-modal perception and interaction.
See Project
World model

MM 2025

AirScape

We introduce AirScape, an aerial generative world model with motion controllability that enables 6DoF aerial agents to predict future observations based on egocentric views and natural language motion intentions.
See Project
Agent

ACL 2025

CityNavAgent

We propose CityNavAgent, an LLM-empowered agent for aerial vision-and-language navigation in urban environments. It leverages a hierarchical semantic planner and a global memory module to significantly reduce navigation complexity and enhance long-term stability.
See Project
Survey

IJCAI 2025

3D-LLM-Survey

We present a comprehensive survey that reviews how Large Language Models (LLMs) are being enabled with 3D spatial reasoning. This work introduces a structured taxonomy to categorize current research, offering a systematic overview of image-based, point cloud-based, and hybrid methods that bridge language and 3D data.
See Project
Benchmark

AAAI 2026

AirCopBench

A benchmark for multi-drone collaborative embodied perception and reasoning. The benchmark evaluates whether vision-language models (VLMs) can process multi-UAV collaborative visual data for question answering, covering perception, reasoning, and decision-making in complex scenarios.
See Project

Workshop

Conference

ICLR 2025 Workshop

This workshop is motivated by a fact: human beings have strong embodied intelligence in an open environment, but it is still challenging for large language models and LLM agents. Depsite some progresses on embodied AI on static and indoor environment, the LLM agents are still struggling in tasks in large-scale outdoor environment, such as navigation, search, spatial reasoning, task planning, etc. Therefore, we propose this workshop to discuss the recent advances on the related research area and looking forward to the future development. Specifically, it delves into topics of outdoor embodied intelligence, such as spatial intelligence and embodied perception, reasoning and planning, decision-making and action, multi-agent and human-agent collaboration, and the development of simulators, testbeds, datasets, and benchmarks. This comprehensive exploration of embodied LLM agents in open city environment holds the potential to advance the field of artificial intelligence and open up new applications in various domains.We also have a special poster/short paper session for those solutions that perform best in the Open Urban Environment Embodied Intelligence Competition. See Project

Where urban life meets embodied intelligence

Latest Projects

Embodied City

UrbanVideo-Bench

Open3DVQA

CityEQA

Embodied-R

AirScape

CityNavAgent

3D-LLM-Survey

AirCopBench

Workshop

ICLR 2025 Workshop