This article is based on the paper ‘Fast Feature Pyramids for Object Detection’ written by Piotr Dollar et al.
The fundamental requirement for an object detection algorithm is to increase the accuracy without sacrificing the speed. For achieving this, we can add some novelty to the existing feature extraction system sacrificing the accuracy a bit to increase the speed of the computation. This is what the Fast Feature Pyramid fulfills. The complete idea is based upon below.
‘Multi-resolution image features may be approximated via extrapolation from nearby scales, rather than being computed Explicitly’
So as per the above concept, we can avoid computing gradients over a finely sampled image pyramid. Image gradient measures the directional intensity change of an image and gives the distribution of the gradient angles of the complete image pixel by pixel. The magnitude and direction is calculated by below formula.
Here gx and gy are the change of pixel values in horizontal and vertical directions.
For a upsampled image, the sum of gradient magnitudes in the original and up sampled image should be related by about a factor of k, where k is the degree of upsampling.It can be done by spreading the pixel level values to all the corresponding smaller pixel values.
For a downsampling image, the information will be lost which is found to be consistent. It leads to the measured gradients undershooting the extrapolated gradients. In other words, the predicted and actual values differ by a nearly constant amount.
In order to avoid above limitations, the power law for the natural images is applied which gives as below.
λΩ can be calculated form the training observations and they are constant for the particular channel. E is the error term associated with the equation to signify the deviation from the power law findings.
As before, let Is denote I captured at scale s and R(I, s) denote I resampled by s. Suppose we have computed C = Ω(I); can we predict the channel image Cs = Ω(Is) at a new scale s using only C?
The standard approach is to compute Cs = Ω(R(I, s)), ignoring the information contained in C = Ω(I). Instead, we propose the following approximation derived from the power law:
Since the error term increases with increase in scales , we can interpolate the octaves(scales multiple with 2,4,8,16 etc.) and formulate nearby scales form that which will be the best tradeoff between speed and accuracy. In this method, the complexity is calculated as 1.5 multiple of n square in a n*n dimension image.
The standard method for the scale pyramid creation takes 100fps and 8 scales per octave creation takes 8fps whereas fast pyramid creation using the power law takes only 50fps.
The above technique is better understandable form the below image.
Improvements in the performance of visual recognition systems in the past decade have in part come from the realization that fifinely sampled pyramids of image
features provide a good front-end for image analysis. It is widely believed that the price to be paid for improved performance is sharply increased computational costs.
Effectiveness of the fast pyramid technique in the context of object detection can be applied to multiple algorithms like Aggregated Channel Features(ACF),Integral Channel Features(ICF) ,Deformable Part Models(DPM) etc. And also it is not restricted to object detection, can be applied to various visual recognition algorithms.