# FAST FEATURE PYRAMID: A MULTISCALE TECHNIQUE FOR OBJECT DETECTION rakesh baral Mar 26 2021 · 2 min read

The fundamental requirement for an object detection algorithm is to increase the accuracy without sacrificing the speed. For achieving this, we can add some novelty to the existing feature extraction system sacrificing the accuracy a bit to increase the speed of the computation. This is what the Fast Feature Pyramid fulfills. The complete idea is based upon below.

Multi-resolution image features may be approximated via extrapolation from nearby scales, rather than being computed  Explicitly’

So as per the above concept, we can avoid computing gradients over a finely sampled image pyramid. Image gradient measures the directional intensity change of an image and gives the distribution of the gradient angles of the complete image pixel by pixel. The magnitude and direction is calculated by below formula.

Here gx and gy are the change of pixel values in horizontal and vertical directions.

For a upsampled image, the sum of gradient magnitudes in the original  and up sampled image should be related by about a factor of k, where k is the degree of upsampling.It can be done by spreading the pixel level values to all the corresponding smaller pixel values.

For a downsampling image, the information will be lost which is found to be consistent. It leads to the measured gradients undershooting the extrapolated gradients. In other words, the predicted and actual values differ by a nearly constant amount. [The  upsampled and downsampled image creates the approximated similar kind of histogram of gradients. For both the cases  2x and 0.34x of each bin of the original image  is taken considering the consistent amount of loss occurs while downsampling. ]

FAST PYRAMID

In order to avoid above limitations, the power law for the natural images is applied which gives as below. [Numerator and denominator are the same function of a particular channel applied on the pixel level of the same image at scales s1 and s2.]

λΩ  can be calculated form the training observations  and they are constant for the particular channel. E is the error term associated with the equation to signify the deviation from the power law findings.

As before, let Is denote I captured at scale s and R(I, s) denote I resampled by s. Suppose we have computed C = Ω(I); can we predict the channel image Cs = Ω(Is) at a new scale s using only C?

The standard approach is to compute Cs = Ω(R(I, s)), ignoring the information contained in C = Ω(I). Instead, we propose the following approximation derived from the power law:

Since the error term increases with increase in scales , we can interpolate the octaves(scales multiple with 2,4,8,16 etc.) and formulate nearby scales form that which will be the best tradeoff between speed and accuracy. In this method, the complexity is calculated as 1.5 multiple of n square in a n*n dimension image.

The standard method for the scale pyramid creation takes 100fps and 8 scales per octave creation takes 8fps whereas fast pyramid  creation using the power law takes only 50fps.

The above technique is better understandable form the below image.