Computer Vision for Spatial Understanding in Egocentric Data

Summer Semester Group Project 2026

Helix 02 Demo for Full Body Autonomy From Figure.

HOT3D Egocentric Dataset from Meta.

Apply

Recent breakthroughs in Foundation Models (LLMs and VLMs) have revolutionized semantic understanding, Artificial Intelligence can now process natural language and recognize objects with unprecedented accuracy. However, a critical gap remains: these models operate as "System 2" thinkers—slow, computational, and detached from physical reality. They lack spatial intelligence which is the instinctive, "System 1" ability to model physics, anticipate motion, and react to dynamic human actions in real-time.
Spatial understanding is essential to developing better embodied intelligence for downstream tasks such as manipulation or navigation in humanoid robots and smarter AI assistants for mixed reality glasses. True collaboration between man and machine requires state prediction and activity anticipation using foundational world models.

In this joint research project between the university of Rostock and UBB Cluj, we will experiment with latent world models that enable better understanding and prediction of human actions from egocentric (first-person) video.
Students will have the opportunity for fully-funded travel to Cluj, Romania for a week of intensive collaboration with the students from UBB, Universitatea Babeș-Bolyai in Cluj.

Requirements

  • Be registered as part of the Project module in Master’s CS International (Mandatory)

  • Familarity with Python, Deep Learning, Pytorch, NumPy, HuggingFace

  • First Principles Thinking - You are comfortable reading and implementing concepts from literature

  • Collaborative Spirit - You will work in a Git-based environment and contribute to shared codebases aimed towards a scientific publication

What We Offer

  • Access to compute resources for training

  • Direct supervision by PhD researchers in Spatial Intelligence and Robotics

  • The opportunity to work with cutting-edge deep learning architectures (Video-LLaMA, VL-JEPA, JEPA, DINO)

  • Funded trip to Cluj, Romania for interdisciplinary collaboration

Open HiWi Positions

Not eligible for the semester project, but want to work on exciting topics anyway? We are currently hiring students for thesis & HiWi positions at the lab. Send us an email if interested.