Yann LeCun

New York University, New York

Learning visual feature hierarchies

Intelligent perceptual tasks such as vision and audition require theconstruction of good internal representations. Theoretical andempirical evidence suggest that the perceptual world is bestrepresented by a multi-stage hierarchy in which features in successivestages are increasingly global, invariant, and abstract. An importantchallenge for Machine Learning and Pattern Recognition is to devise"deep learning" methods for multi-stage architecture than can automatically learn good feature hierarchies from labeled andunlabeled data.

We will demonstrate the use of deep learning methods, based onunsupervised sparse coding, to train convolutional network (ConvNets). ConvNets are biologically-inspired architecturesconsisting of multiple stages of filter banks, interspersed withnon-linear operations, and spatial pooling operations.

A number of applications will be shown through videos and live demos, including a category-level object recognition system that can betrained on the fly, a pedestrian detector, a system that recognizeshuman activities in videos, and a trainable vision system for off-roadmobile robot navigation. Specialized hardware architecture thatimplement these algorithms will also be described.