Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision systems. Many approaches to the task have been implemented over multiple decades.
Changes in lighting and color usually don't have much effect on image edges
Detect edges in template and image
Compare edges images to find the template
Must consider range of possible template positions
Good – count the number of overlapping edges. Not robust to changes in shape
Better – count the number of template edge pixels with some distance of an edge in the search image
Best – determine probability distribution of distance to nearest edge in search image (if template at correct position). Estimate likelihood of each template position generating image
Consider all positions as a set (a cell in the space of positions)
Determine lower bound on score at best position in cell
If bound is too large, prune cell
If bound is not too large, divide cell into subcells and try each subcell recursively
Process stops when cell is “small enough”
Unlike multi-resolution search, this technique is guaranteed to find all matches that meet the criterion (assuming that the lower bound is accurate)
Finding the Bound:
To find the lower bound on the best score, look at score for the template position represented by the center of the cell
Subtract maximum change from the “center” position for any other position in cell (occurs at cell corners)
Complexities arise from determining bounds on distance
Edges are (mostly) robust to illumination changes, however they throw away a lot of information
Must compute pixel distance as a function of both pixel position and pixel intensity
Can be applied to color also
Another way to be robust to illumination changes without throwing away as much information is to compare image gradients
Matching is performed like matching greyscale images
Simple alternative: Use (normalized) correlation
Histograms of receptive field responsesEdit
Avoids explicit point correspondences
Relations between different image points implicitly coded in the receptive field responses
Swain and Ballard (1991), Schiele and Crowley (2000), Linde and Lindeberg (2004, 2012)
One approach to efficiently searching the database for a specific image to use eigenvectors of the templates (called eigenfaces)
Modelbases are a collection of geometric models of the objects that should be recognized
a search is used to find feasible matches between object features and image features.
the primary constraint is that a single position of the object must account for all of the feasible matches.
methods that extract features from the objects to be recognized and the images to be searched.
A method for searching for feasible matches, is to search through a tree.
Each node in the tree represents a set of matches.
Root node represents empty set
Each other node is the union of the matches in the parent node and one additional match.
Wildcard is used for features with no match
Nodes are “pruned” when the set of matches is infeasible.
A pruned node has no children
Historically significant and still used, but less commonly
Hypothesize and testEdit
Hypothesize a correspondence between a collection of image features and a collection of object features
Then use this to generate a hypothesis about the projection from the object coordinate frame to the image frame
Use this projection hypothesis to generate a rendering of the object. This step is usually known as backprojection
Compare the rendering to the image, and, if the two are sufficiently similar, accept the hypothesis
There are a variety of different ways of generating hypotheses.
When camera intrinsic parameters are known, the hypothesis is equivalent to a hypothetical position and orientation – pose – for the object.
Utilize geometric constraints
Construct a correspondence for small sets of object features to every correctly sized subset of image points. (These are the hypotheses)
Three basic approaches:
Obtaining Hypotheses by Pose Consistency
Obtaining Hypotheses by Pose Clustering
Obtaining Hypotheses by Using Invariants
Expense search that is also redundant, but can be improved using Randomization and/or Grouping
Examining small sets of image features until likelihood of missing object becomes small
For each set of image features, all possible matching sets of model features must be considered.
( 1 – Wc)k = Z
W = the fraction of image points that are “good” (w ~ m/n)
c = the number of correspondences necessary
k = the number of trials
Z = the probability of every trial using one (or more) incorrect correspondences
If we can determine groups of points that are likely to come from the same object, we can reduce the number of hypotheses that need to be examined
Also called Alignment, since the object is being aligned to the image
Correspondences between image features and model features are not independent – Geometric constraints
A small number of correspondences yields the object position – the others must be consistent with this
If we hypothesize a match between a sufficiently large group of image features and a sufficiently large group of object features, then we can recover the missing camera parameters from this hypothesis (and so render the rest of the object)
Generate hypotheses using small number of correspondences (e.g. triples of points for 3D recognition)
Project other model features into image (backproject) and verify additional correspondences
Use the smallest number of correspondences necessary to achieve discrete object poses
Keypoints of objects are first extracted from a set of reference images and stored in a database
An object is recognized in a new image by individually comparing each feature from the new image to this database and finding candidate matching features based on Euclidean distance of their feature vectors.
Genetic algorithms can operate without prior knowledge of a given dataset and can develop recognition procedures without human intervention. A recent project achieved 100 percent accuracy on the benchmark motorbike, face, airplane and car image datasets from Caltech and 99.4 percent accuracy on fish species image datasets.
^O. Linde and T. Lindeberg "Composed complex-cue histograms: An investigation of the information content in receptive field based image descriptors for object recognition", Computer Vision and Image Understanding, 116:4, 538-560, 2012.
^Lowe, D. G., "Distinctive image features from scale-invariant keypoints", International Journal of Computer Vision, 60, 2, pp. 91-110, 2004.
^Bay, Herbert; Ess, Andreas; Tuytelaars, Tinne; Van Gool, Luc (2008). "Speeded-Up Robust Features (SURF)". Computer Vision and Image Understanding. 110 (3): 346–359. CiteSeerX10.1.1.205.738. doi:10.1016/j.cviu.2007.09.014.
^"New object recognition algorithm learns on the fly". Gizmag.com. 20 January 2014. Retrieved 2014-01-21.
^Lillywhite, K.; Lee, D. J.; Tippetts, B.; Archibald, J. (2013). "A feature construction method for general object recognition". Pattern Recognition. 46 (12): 3300. Bibcode:2013PatRe..46.3300L. doi:10.1016/j.patcog.2013.06.002.
^Brown, Matthew, and David G. Lowe. "Unsupervised 3D object recognition and reconstruction in unordered datasets." 3-D Digital Imaging and Modeling, 2005. 3DIM 2005. Fifth International Conference on. IEEE, 2005.
^ abOliva, Aude, and Antonio Torralba. "The role of context in object recognition." Trends in cognitive sciences 11.12 (2007): 520-527.
^ abNiu, Zhenxing, et al. "Context aware topic model for scene recognition." 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012.
^Zhu, Song-Chun, and David Mumford. "A stochastic grammar of images." Foundations and Trends in Computer Graphics and Vision 2.4 (2007): 259-362.
^Nayar, Shree K., and Ruud M. Bolle. "Reflectance based object recognition." International journal of computer vision 17.3 (1996): 219-240.
^Worthington, Philip L., and Edwin R. Hancock. "Object recognition using shape-from-shading." IEEE Transactions on Pattern Analysis and Machine Intelligence 23.5 (2001): 535-542.
^Shotton, Jamie, et al. "Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context." International journal of computer vision 81.1 (2009): 2-23.
^Donahue, Jeffrey, et al. "Long-term recurrent convolutional networks for visual recognition and description." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
^Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating image descriptions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
^P Duygulu; K Barnard; N de Fretias & D Forsyth (2002). "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary". Proceedings of the European Conference on Computer Vision. pp. 97–112. Archived from the original on 2005-03-05.
^Esteva, Andre, et al. "Dermatologist-level classification of skin cancer with deep neural networks." Nature 542.7639 (2017): 115.
^Brown, M., and Lowe, D.G., "Recognising Panoramas," ICCV, p. 1218, Ninth IEEE International Conference on Computer Vision (ICCV'03) - Volume 2, Nice,France, 2003
^Li, L., Guo, B., and Shao, K., "Geometrically robust image watermarking using scale-invariant feature transform and Zernike moments," Chinese Optics Letters, Volume 5, Issue 6, pp. 332-335, 2007.
^Se,S., Lowe, D.G., and Little, J.J.,"Vision-based global localization and mapping for mobile robots", IEEE Transactions on Robotics, 21, 3 (2005), pp. 364-375.
^Thomas Serre, Maximillian Riesenhuber, Jennifer Louie, Tomaso Poggio, "On the Role of Object-Specific features for Real World Object Recognition in Biological Vision." Artificial Intelligence Lab, and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Center for Biological and Computational Learning, Mc Govern Institute for Brain Research, Cambridge, MA, USA
^Anne Permaloff and Carl Grafton, "Optical Character Recognition" Political Science and Politics, Vol. 25, No. 3 (Sep., 1992), pp. 523-531
^Christian Demant, Bernd Streicher-Abel, Peter Waszkewitz, "Industrial image processing: visual quality control in manufacturing" Outline of object recognition at Google Books
^Nuno Vasconcelos "Image Indexing with Mixture Hierarchies" Archived 2011-01-18 at the Wayback Machine Compaq Computer Corporation, Proc. IEEE Conference in Computer Vision and Pattern Recognition, Kauai, Hawaii, 2001
^Heikkilä, Janne; Silvén, Olli (2004). "A real-time system for monitoring of cyclists and pedestrians". Image and Vision Computing. 22 (7): 563–570. doi:10.1016/j.imavis.2003.09.010.
^Ho Gi Jung, Dong Suk Kim, Pal Joo Yoon, Jaihie Kim, "Structure Analysis Based Parking Slot Marking Recognition for Semi-automatic Parking System" Structural, Syntactic, and Statistical Pattern Recognition, Springer Berlin / Heidelberg, 2006
^S. K. Nayar, H. Murase, and S.A. Nene, "Learning, Positioning, and tracking Visual appearance" Proc. Of IEEE Intl. Conf. on Robotics and Automation, San Diego, May 1994
^Liu, F.; Gleicher, M.; Jin, H.; Agarwala, A. (2009). "Content-preserving warps for 3D video stabilization". ACM Transactions on Graphics. 28 (3): 1. CiteSeerX10.1.1.678.3088. doi:10.1145/1531326.1531350.
Elgammal, Ahmed "CS 534: Computer Vision 3D Model-based recognition", Dept of Computer Science, Rutgers University;
Hartley, Richard and Zisserman, Andrew "Multiple View Geometry in computer vision", Cambridge Press, 2000, ISBN 0-521-62304-9.
Roth, Peter M. and Winter, Martin "Survey of Appearance-Based Methods for Object Recognition", Technical Report ICG-TR-01/08, Inst. for Computer Graphics and Vision, Graz University of Technology, Austria; January 15, 2008.
Collins, Robert "Lecture 31: Object Recognition: SIFT Keys", CSE486, Penn State
IPRG Image Processing - Online Open Research Group
Christian Szegedy, Alexander Toshev and Dumitru Erhan. Deep Neural Networks for Object Detection. Advances in Neural Information Processing Systems 26, 2013. page 2553–2561.