Shape and Motion from Image Streams: a Factorization Method

著者: Carlo Tomasi
タイトル: Shape and Motion from Image Streams: a Factorization Method
日時: Sep 1991
概要: We propose a method for estimating the three-dimensional shape of objects and the motion of the camera from a stream of images The goal is to give a robot the ability to localize itself with respect to the environment, draw a map of its own surroundings, and perceive the shape of objects in order to recognize or grasp them. Solutions proposed in the past were so sensitive to noise as to be of little use in practical applications. This sensitivity is closely related to the viewercentered representation of scene geometry known as a depth map, and to the use of stereo triangulation to infer depth from the images. In fact, when objects are more than a few focal lengths away from the camera, parallax effects become subtle, and even a small amount of noise in the images produces large errors in the final shape and motion results. In our formulation, we represent shape in object-centered coordinates, and model image formation by orthographic, rather than perspective projection. In this way, depth, the distance between viewer and scene, play no role, and the problem's sensitivity to noise is critically reduced. We collect the image coordinates of P feature points tracked through F frames into a 2F X P measurement matrix. If these coordinates are measured with respect to their cent- roid, we show that represent the measurement matrix can be written as the product of two matrices that represent the camera rotation and the positions of the feature points in space. The bilinear nature of this model, and its matrix formulation, lead to a factorization method for the computation of shape and motion, based on the Singular Value Decomposition. Previous solutions assumed motion to be smooth, in one form or another, in an attempt to constrain the solution and achieve reliable convergence. The factorization method, on the other hand, makes on assump- tion about the camera motion, and can deal with the large jumps from frame to frame found, for instance, in sequences taken with a hand-held camera. To make the factorization method into a working system, we solve several corollary problems: how to select image features, how to track them from frame to frame, how to deal with occlusions, and how to cope with the noise and artifacts that corrupt image features, how to track them from frame to frame, how to deal with occlusions, and how to cope with the noise and artifacts that corrupt images recorded with ordinary equip- ment. We test the entire system with a series of experiments on real images taken both in the lab, for an accurate performance evaluation, and outdoors, to demonstrate the applicability of the method in real-life situations.
カテゴリ: CMUTR

Category: CMUTR Institution: Department of Computer Science, Carnegie Mellon University Abstract: We propose a method for estimating the three-dimensional shape of objects and the motion of the camera from a stream of images The goal is to give a robot the ability to localize itself with respect to the environment, draw a map of its own surroundings, and perceive the shape of objects in order to recognize or grasp them. Solutions proposed in the past were so sensitive to noise as to be of little use in practical applications. This sensitivity is closely related to the viewercentered representation of scene geometry known as a depth map, and to the use of stereo triangulation to infer depth from the images. In fact, when objects are more than a few focal lengths away from the camera, parallax effects become subtle, and even a small amount of noise in the images produces large errors in the final shape and motion results. In our formulation, we represent shape in object-centered coordinates, and model image formation by orthographic, rather than perspective projection. In this way, depth, the distance between viewer and scene, play no role, and the problem's sensitivity to noise is critically reduced. We collect the image coordinates of P feature points tracked through F frames into a 2F X P measurement matrix. If these coordinates are measured with respect to their cent- roid, we show that represent the measurement matrix can be written as the product of two matrices that represent the camera rotation and the positions of the feature points in space. The bilinear nature of this model, and its matrix formulation, lead to a factorization method for the computation of shape and motion, based on the Singular Value Decomposition. Previous solutions assumed motion to be smooth, in one form or another, in an attempt to constrain the solution and achieve reliable convergence. The factorization method, on the other hand, makes on assump- tion about the camera motion, and can deal with the large jumps from frame to frame found, for instance, in sequences taken with a hand-held camera. To make the factorization method into a working system, we solve several corollary problems: how to select image features, how to track them from frame to frame, how to deal with occlusions, and how to cope with the noise and artifacts that corrupt image features, how to track them from frame to frame, how to deal with occlusions, and how to cope with the noise and artifacts that corrupt images recorded with ordinary equip- ment. We test the entire system with a series of experiments on real images taken both in the lab, for an accurate performance evaluation, and outdoors, to demonstrate the applicability of the method in real-life situations. Number: CMU-CS-91-172 Bibtype: TechReport Month: Sep Author: Carlo Tomasi Title: Shape and Motion from Image Streams: a Factorization Method Year: 1991 Address: Pittsburgh, PA Super: @CMUTR