Technical session talks from ICRA 2012
TechTalks from event: Technical session talks from ICRA 2012
Conference registration code to access these videos can be accessed by visiting this link: PaperPlaza. Step-by-step to access these videos are here: step-by-step process .
Why some of the videos are missing? If you had provided your consent form for your video to be published and still it is missing, please contact email@example.com
Vision-Based Attention and Interaction
Computing Object-Based Saliency in Urban Scenes Using Laser SensingIt becomes a well-known technology that a low-level map of complex environment containing 3D laser points can be generated using a robot with laser scanners. Given a cloud of 3D laser points of an urban scene, this paper proposes a method for locating the objects of interest, e.g. traffic signs or road lamps, by computing object-based saliency. Our major contributions are: 1) a method for extracting simple geometric features from laser data is developed, where both range images and 3D laser points are analyzed; 2) an object is modeled as a graph used to describe the composition of geometric features; 3) a graph matching based method is developed to locate the objects of interest on laser data. Experimental results on real laser data depicting urban scenes are presented; efficiency as well as limitations of the method are discussed.
Where Do I Look Now? Gaze Allocation During Visually Guided ManipulationIn this work we present principled methods for the coordination of a robot's oculomotor system with the rest of its body motor systems. The problem is to decide which physical actions to perform next and where the robot's gaze should be directed in order to gain information that is relevant to the success of its physical actions. Previous work on this problem has shown that a reward-based coordination mechanism provides an efficient solution. However, that approach does not allow the robot to move its gaze to different parts of the scene, it considers the robot to have only one motor system, and assumes that the actions have the same duration. The main contributions of our work are to extend that previous reward-based approach by making decisions about where to fixate the robot's gaze, handling multiple motor systems, and handling actions of variable duration. We compare our approach against two common baselines: random and round robin gaze allocation. We show how our method provides a more effective strategy to allocate gaze where it is needed the most.
3D AAM Based Face Alignment under Wide Angular Variations Using 2D and 3D DataActive Appearance Models (AAMs) are widely used to estimate the shape of the face together with its orientation, but AAM approaches tend to fail when the face is under wide angular variations. Although it is feasible to capture the overall 3D face structure using 3D data from range cameras, the locations of facial features are often estimated imprecisely or incorrectly due to depth measurement uncertainty. Face alignment using 2D and 3D images suffer from different issues and have varying reliability in different situations. The existing approaches introduce a weighting function to balance 2D and 3D alignments in which the weighting function is tuned manually and the sensor characteristics are not taken into account. In this paper, we propose to balance 3D face alignment using 2D and 3D data based on the observed data and the sensors characteristics. The feasibility of wide-angle face alignment is demonstrated using two different sets of depth and conventional cameras. The experimental results show that a stable alignment is achieved with a maximum improvement of 26% compared to 3D AAM using 2D image and 30% improvement over the state-of-the-art 3DMM methods in terms of 3D head pose estimation.
Robots That Validate Learned Perceptual ModelsService robots that should operate autonomously need to perform actions reliably, and be able to adapt to their changing environment using learning mechanisms. Optimally, robots should learn continuously but this approach often suffers from problems like over-fitting, drifting or dealing with incomplete data. In this paper, we propose a method to automatically validate autonomously acquired perception models. These perception models are used to localize objects in the environment with the intention of manipulating them with the robot. Our approach verifies the learned perception models by moving the robot, trying to re-detect an object and then to grasp it. From observable failures of these actions and high-level loop-closures to validate the eventual success, we can derive certain qualities of our models and our environment. We evaluate our approach by using two different detection algorithms, one using 2D RGB data and one using 3D point clouds. We show that our system is able to improve the perception performance significantly by learning which of the models is better in a certain situation and a specific context. We show how additional validation allows for successful continuous learning. The strictest precondition for learning such perceptual models is correct segmentation of objects which is evaluated in a second experiment.
Uncalibrated Visual Servoing for Intuitive Human Guidance of RobotsWe propose a novel implementation of visual servoing whereby a human operator can guide a robot relative to the coordinate frame of an eye-in-hand camera. Among other applications, this can allow the operator to work in the image space of the eye-in-hand camera. This is achieved using a gamepad, a time-of-flight camera (an active sensor that creates depth data), and recursive least-squares update with Gauss-Newton control. Contributions of this paper include the use of a person to cause the control action in a visual-servoing system, and the introduction of uncalibrated position-based visual servoing. The system's efficacy is evaluated via trials involving human operators in different scenarios.
Leveraging RGB-D Data: Adaptive Fusion and Domain Adaptation for Object DetectionVision and range sensing belong to the richest sensory modalities for perception in robotics and related fields. This paper addresses the problem of how to best combine image and range data for the task of object detection. In particular, we propose a novel adaptive fusion approach, hierarchical Gaussian Process mixtures of experts, able to account for missing information and cross-cue data consistency. The hierarchy is a two-tier architecture that for each modality, each frame and each detection computes a weight function using Gaussian Processes that reflects the confidence of the respective information. We further propose a method called cross-cue domain adaptation that makes use of large image data sets to improve the depth-based object detector for which only few training samples exist. In the experiments that include a comparison with alternative sensor fusion schemes, we demonstrate the viability of the proposed methods and achieve significant improvements in classification accuracy.