FPN for Object Detection

My notes of reading this paper.

FPN stands for Feature Pyramid Network, used for improving Object Detection.

One traditional way to do Object Detection is to use sliding window with multiple window sizes which is in (a). However this method is slow and computational intensive. (b) and (c) are mentioned for approaches that used by others to overcome the disadvantage of (a).

The author also mentioned that a deep ConvNet can generate multiple scale features by sampling. However these feature maps for different spatial resolutions have large gaps which can harm the result.

(d) is the method this paper is promoting.

Resolution and semantic of feature :

Bottom-up, up-down path way and lateral connections

Bottom-up pathway is a regular ConvNet backbone with scale step=2(ResNet here), Up-down pathway is scaling up with step size=2 (with nearest neighbor upsampling).

When adding the Bottom-up layer to the Up-down layer, the Bottom-up layer on the left has x2 times more channels, so it uses Conv 1×1 to reduce channel number to add the layer on the right by element wise. The up-down layer on the right has to upsample (x2) to match the hight and width of the left. The upsample can cause aliasing effect. Conv 3×3 is used to reduce this aliasing effect.

Code can find found here. Also thanks to this video for better understanding.

Nested X Server

Use Xephyr to run a second X window with 8bit color depth in Raspberry Pi
https://nims11.wordpress.com/2012/06/24/nested-x-servers-with-xephyr/

Xephyr -ac -screen 800x600x8 -reset :1 &

This will open a blank window
In the original window, type
DISPLAY=:1

Then run the C program. It show “Colomap_size: 256”.
It worked!