Abstract
We propose a fast approach to 3–D object detection and pose estimation that owes its robustness to a training phase during which the target object slowly moves with respect to the camera. No additional information is provided to the system, save a very rough initialization in the first frame of the training sequence. It can be used to detect the target object in each video frame independently.
Our approach relies on a Randomized Tree-based approach to wide-baseline feature matching. Unlike previous classification-based appro- aches to 3–D pose estimation, we do not require an a priori 3–D model. Instead, our algorithm learns both geometry and appearance. In the process, it collects, or harvests, a list of features that can be reliably recognized even when large motions and aspect changes cause complex variations of feature appearances. This is made possible by the great flexibility of Randomized Trees, which lets us add and remove feature points to our list as needed with a minimum amount of extra computation.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Lepetit, V., Lagger, P., Fua, P.: Randomized Trees for Real-Time Keypoint Recognition. In: Conference on Computer Vision and Pattern Recognition, San Diego, CA (2005)
Amit, Y., Geman, D.: Shape Quantization and Recognition with Randomized Trees. Neural Computation 9, 1545–1588 (1997)
Davison, A.: Real-Time Simultaneous Localisation and Mapping with a Single Camera. In: International Conference on Computer Vision, pp. 1403–1410 (2003)
Se, S., Lowe, D.G., Little, J.: Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. International Journal of Robotics Research 22, 735–758 (2002)
Meltzer, J., Yang, M.H., Gupta, R., Soatto, S.: Multiple View Feature Descriptors from Image Sequences via Kernel Principal Component Analysis. In: European Conference on Computer Vision, pp. 215–227 (2004)
Skrypnyk, I., Lowe, D.G.: Scene modelling, recognition and tracking with invariant image features. In: International Symposium on Mixed and Augmented Reality, Arlington, VA, pp. 110–119 (2004)
Lepetit, V., Fua, P.: Monocular model-based 3d tracking of rigid objects: A survey. Foundations and Trends in Computer Graphics and Vision 1, 1–89 (2005)
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A comparison of affine region detectors. Accepted to International Journal of Computer Vision (2005)
Lowe, D.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 20, 91–110 (2004)
Mikolajczyk, K., Schmid, C.: A Performance Evaluation of Local Descriptors. In: Conference on Computer Vision and Pattern Recognition, pp. 257–263 (2003)
Pritchard, D., Heidrich, W.: Cloth motion capture. Eurographics 22, 263–271 (2003)
Beis, J., Lowe, D.: Shape Indexing using Approximate Nearest-Neighbour Search in High-Dimensional Spaces. In: Conference on Computer Vision and Pattern Recognition, Puerto Rico, pp. 1000–1006 (1997)
Lepetit, V., Fua, P.: Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence (2006) (Accepted for publication)
Mar´ee, R., Geurts, P., Piater, J., Wehenkel, L.: Random subwindows for robust image classification. In: Conference on Computer Vision and Pattern Recognition (2005)
Chum, O., Matas, J.: Matching with PROSAC - Progressive Sample Consensus. In: Conference on Computer Vision and Pattern Recognition, San Diego, CA, pp. 220–226 (2005)
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Özuysal, M., Lepetit, V., Fleuret, F., Fua, P. (2006). Feature Harvesting for Tracking-by-Detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds) Computer Vision – ECCV 2006. ECCV 2006. Lecture Notes in Computer Science, vol 3953. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11744078_46
Download citation
DOI: https://doi.org/10.1007/11744078_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33836-9
Online ISBN: 978-3-540-33837-6
eBook Packages: Computer ScienceComputer Science (R0)