This work addresses the problem of tracking the 3D pose of a camera in space, using the images it acquires while moving freely in unmodeled, arbitrary environments. A novel feature-based approach for camera tracking is proposed, intended to facilitate tracking in on-line, time-critical applications such as video see-through augmented reality. In contrast to several existing methods which are designed to operate in a batch, off-line mode, assuming that the whole video sequence to be tracked is available before tracking commences, the proposed method operates on images incrementally. At its core lies a feature-based 3D plane tracking technique, which permits the estimation of the homographies induced by a virtual 3D plane between successive image pairs. Knowledge of these homographies allows the corresponding projection matrices encoding camera motion to be expressed in a common projective frame and, therefore, to be recovered directly, without estimating 3D structure. Projective camera matrices are then upgraded to Euclidean and used for recovering structure, which is in turn employed for refining the projection matrices through local bundle adjustment. The proposed approach is causal, is tolerant to erroneous and missing feature matches, does not require modifications of the environment and has realistic computational requirements.
A detailed description of the approach can be found in ICS/FORTH Technical Report #324, Sep. 2003. A shorter version entitled ''Vision-based Camera Motion Recovery for Augmented Reality'', was published in the 2004 Computer Graphics International Conference (CGI'04). Additionally, a journal version titled ''Efficient, Causal Camera Tracking In Unprepared Environments'' has been accepted for publication in the Computer Vision and Image Understanding Journal and a demo video titled ``Camera Matchmoving in Unprepared, Unknown Environments'' will be included in the CVPR'05 video proceedings.
Sample experimental results from the application of the proposed camera tracker on a variety of image sequences are shown below. For each sequence, a VRML file illustrating the recovered motion and structure is provided. Dots correspond to 3D points, red pyramids to camera locations and green polylines to camera trajectories. Running times were measured on an Intel P4@2.5 GHz laptop. Roughly 80% of the reported execution time is spent for detecting and matching image corners.
We recommend using VRMLview to inspect VRML models.
Clicking on the second column images brings up a larger view
Images of a model house, acquired by a fixed camera as a model house on a turntable made a full revolution around its vertical axis.
"Frozen time" sequence captured with Digital Air's TimeTrack camera.
Sequence acquired by a camera mounted on a mobile robot as it approached the scene while smoothly turning left.
Sequence shot with a handheld camera, exhibiting relatively large interframe translational motion and epipoles being located outside the images.
Sequence shot with a camcorder, frames are characterized by very small interframe motion. Imaged scene contains two dominant planes, relative to which the camera moves laterally.
Small interframe motion sequence, shot with a camcorder as the operator approached the scene. Forward camera motion results in the angle between the triangulating 3D lines being small, making structure recovery challenging.
Sequence shot with a firewire webcam undergoing complex motion, resulting in large changes in the field of view.
Images of a two-face calibration object that were acquired with a consumer digital camera. Corners were determined as the intersections of line segments fitted to the calibration grids.
In addition to VRML reconstructions, the tracking results were used to augment the original sequences with artificial 3D objects. To achieve this, the estimated camera trajectories were exported to 3DSMax using MaxScript and then the augmented sequences were generated with the aid of 3DSMax's rendering engine that used the original sequence as a background. The initial alignment of the coordinate systems employed by the camera tracker and 3DSMax was achieved interactively, by manually rotating and translating them until they lined up. The placement of the artificial graphical objects into the scene was guided by the structure information also provided by the camera tracker.
Click here for a ~16Mb video augmenting (among others) the above sequences. Notice that the frame dimensions have been decreased to reduce the video's file size
For any questions, please contact lourakis@ics.forth.gr