About Us

The CMU Computer Vision Group focuses on the breadth of computer vision research including  object recognition and scene understanding (Gupta, Ramanan, Bagnell, Hebert, Kanade); 3D geometry and reconstruction (Kaess, Narasimhan, Lucey, Sheikh, Kanade); motion analysis and tracking (Ramanan, Gupta, Lucey, De la Torre, Kitani, Sheikh); physics-based vision (Narasimhan); and analysis of 3D data (Hebert, Huber).

The pace of progress in the areas of object recognition and scene analysis has increased over the past five years in the computer vision community, owing in part to more sophisticated use of the tools from machine learning and the availability of large data sets. The CMU group has greatly expanded its activities in these areas through close collaborations with the CMU Machine Learning Department. In particular, the vision group is a leading contributor to the transformation of the field in the last five years from pattern classification of image pixels to deep understanding of scenes, including geometry, context, and other physical world constraints. This research area has considerable potential given the unique collaborations with researchers in Machine Learning and AI. Another important development in the last five years is the access to huge amounts of visual data (images and videos), for example from web repositories, which raises new research questions and offers new opportunities for computer vision algorithms. We are at the forefront of this transformation through major contributions to defining new ways of addressing computer vision tasks that exploit the richness of the data sets.

At the other end of the spectrum, recent years have seen a resurgence of interest in recognition tasks in which the goal is to recognize instances of a specific object rather than broad categories of objects. Interest in this problem is based on renewed interest in classic robotics problems, such as bin-picking, but also in human-centric vision tasks in which one goal is to understand a person’s environment. New opportunities in this area are developing, through industry funding and through the QoLT center, respectively, and we expect this area to grow in the future.

In the past three decades, the RI vision group has made major contributions to the problem of reconstructing the 3D geometry of a scene from multiple views. While this problem is now well addressed in the case of static environments, 3D understanding of dynamic environments poses difficult challenges, which are being addressed through an ambitious research program. This includes investigating the use of a large number of imaging sensors based on the concept that complex sensing tasks, such as dynamic 3D scene understanding, should be solved by a large number of parallel but simple perceptual processes. As cameras proliferate in society with hand-held devices, this research will enable research into large-scale sensing challenges.

All aspects of human sensing, such as detecting, tracking, and understanding peoples’ faces, bodies, and activities are addressed by the vision group. This research has led to important developments both in the theoretical and the algorithmic aspects, including facial expression analysis and body posture recovery and tracking. Work on recognizing actions and activities is expanding through new approaches for recognition in videos. In addition to the research products, the vision group has had major impact on the field through several databases and benchmarks, e.g., the face databases and the Grand Challenge database of human activities generated in the QoLT ERC.

The area of physics-based vision is led since Fall 2004 by Narasimhan who has developed it into three key areas: the mathematical modeling of the interactions of light with materials and the atmosphere; the design of novel cameras with higher resolution in space, color, and intensity; and the development of algorithms for rendering and interpreting scene appearance. This work contributed fundamental tools toward modeling and understanding light transport and reflection and it generated new applications in a number of fields including robotics, digital entertainment, remote sensing, and underwater imaging. Most recent work includes the development of new solutions for structured light sensing, and new display technology. This activity is central to the unique strength of RI in research activities combining computer vision and computer graphics.

As computer vision research matures, opportunities for applications and collaborations have increased. Accordingly, in addition to the basic research lines, the computer vision group collaborates with, and contributes to a large number of other areas and projects both within the University and across the broad Computer Vision community at leading universities and industrial research units.

The vision faculty continued to develop opportunities for computer vision applications in the area of mobile systems, e.g., unmanned ground vehicles, intelligent cars. For example, motion analysis and tracking techniques are instrumental in systems for intelligent driving; 3D scene interpretation is central to the development of unmanned ground vehicles (UGVs). Existing and recent activities include industry-funded projects, and large DoD efforts in the area of UGVs, in collaboration with NREC researchers.

The vision faculty has built on new opportunities in the general area of using computer vision to enhance communication and interaction. For example, owing to progress in face analysis, new modes of interactions combining face analysis and synthesis are being investigated, at the boundary between computer vision and computer graphics. In the HRI area, research involves the use of sensor fusion and activity recognition to optimize the efficiency of industrial workcells to allow people and intelligent and dexterous machines to work together safely.

The QoLT ERC provided a new set of challenging vision tasks in the areas of recognition and behavior understanding in the general area of assistive technologies. This area provided new opportunities for collaboration with practitioners in rehabilitation science, aging, nursing, and related fields at the University of Pittsburgh and at CMU, giving the vision researchers a better understanding of real-world applications and access to data and users studies. In the last two years, in particular, the impact of this work was evidenced by a range of programs addressing human-centric applications of major impact, e.g., participation in a NSF Expeditions for research on autism, computer vision for sensory substitution, and behavior and face analysis.

New exciting opportunities are being pursued in biological engineering in which the expertise gained in the motion analysis and tracking areas led to development of a fully-automated computer vision-based cell tracking algorithms which can track whole populations of cells on a test chip in real-time. This approach has the potential of transforming key aspects of biological engineering by reducing the high cost and long timelines for gathering and interpreting experimental data.


  • Semantic Component Analysis (ICCV 2015)

    What you see depends not only on what there is to see, but also on what you expect to see. Human vision relies heavily on priors about how things should appear in the world, allowing for efficient vis......


  • Activity Forecasting (ECCV 2012)

    We address the task of inferring the future actions of people from noisy visual input. We denote this task activity forecasting. To achieve accurate activity forecasting, our approach models the effec......


  • The Panoptic Studio: A Massively Multiview System for Social Motion Capture (ICCV 2015)

    We present an approach to capture the 3D structure and motion of a group of people engaged in a social interaction. The core challenges in capturing social interactions are: (1) occlusion is functiona......