New algorithm improves speed and accuracy of pedestrian detection
09 February 2016
Electrical engineers at the University of California, San Diego have developed a pedestrian detection system that performs accurately and in near real-time.
At two to four frames per second and close to half the error compared with existing systems, the technology, which incorporates deep learning models, could be used in 'smart' vehicles, robotics and image and video search systems.
"We're aiming to build computer vision systems that will help computers better understand the world around them," said UC San Diego's Professor Nuno Vasconcelos who directed the research.
The new pedestrian detection algorithm developed by Vasconcelos and his team combines a traditional computer vision classification architecture, known as cascade detection, with deep learning models.
Pedestrian detection systems typically break down an image into small windows that are processed by a classifier that signals the presence or absence of a pedestrian. This approach is challenging because pedestrians appear in different sizes, depending on distance to the camera, and locations within an image. Typically, millions of windows must be inspected by video frame at speeds ranging from 5-30 frames per second.
In cascade detection, the detector operates throughout a series of stages. In the first stages, the algorithm quickly identifies and discards windows that it can easily recognize as not containing a person (such as the sky).
The next stages process the windows that are harder for the algorithm to classify, such as those containing a tree, which the algorithm could recognize as having person-like features.
In the final stages, the algorithm must distinguish between a pedestrian and very similar objects. However, because the final stages only process a few windows, the overall complexity is low.
Traditional cascade detection relies on 'weak learners', which are simple classifiers, to do the job at each stage. The first stages use a small number of weak learners to reject the easy windows, while the later stages rely on larger numbers of weak learners to process the harder windows.
While this method is fast, it isn't powerful enough when it reaches the final stages. That's because the weak learners used in all stages of the cascade are identical. So even though there are more classifiers in the last stages, they're not necessarily capable of performing highly complex classification.
To address this problem, Vasconcelos and his team developed a novel algorithm that incorporates deep learning models in the final stages of a cascaded detector. Deep learning models are better suited to complex pattern recognition, which they can perform after being trained with hundreds or thousands of examples - in this case, images that either have, or don't have, a person. However, deep learning models are too complex for real-time implementation. While they work well for the final cascade stages, they are too complex to be used in the early ones.
The solution is a new cascade architecture that combines classifiers from different families: simple classifiers (weak learners) in the early stages complex classifiers (deep learning models) in the later stages.
This is not trivial to accomplish, Vasconcelos notes, since the algorithm used to learn the cascade has to find the combination of weak learners that achieves the optimal trade-off between detection accuracy and complexity for each cascade stage.
Accordingly, Vasconcelos and his team introduced a new mathematical formulation for this problem, which resulted in a new algorithm for cascade design.
The algorithm currently only works for binary detection tasks, such as pedestrian detection, but the researchers are aiming to extend the cascade technology to detect many objects simultaneously.