Object Detection

2 minute read

Published: July 15, 2024

Object detection is a computer vision technique that allows us to identify and locate objects in an image or video.
This is a complex task, because the size, shape, and position of objects can vary significantly.

Common Object Detection Algorithms

There are many object detection algorithms, but some of the most popular include:

Region Proposal + Deep Learning Classification: This approach uses a region proposal algorithm to identify potential objects in an image, and then uses a deep learning classifier to classify the objects.
- R-CNN (Selective Search + CNN + SVM)
- Fast R-CNN (Selective Search + CNN + ROI)
- Faster R-CNN (Region Proposal Network + CNN + ROI)
Single Shot Detectors: These algorithms use a single neural network to predict the bounding boxes and class labels for all objects in an image.
- YOLO (You Only Look Once)
- SSD (Single Shot MultiBox Detector)

Intersection over Union (IOU) is a measure of the overlap between two bounding boxes.
It is calculated as the area of the intersection of the two boxes divided by the area of their union.
High IOU values indicate a high degree of overlap between the boxes. \(IoU = \frac{Area of Overlap}{Area of Union}\)

We using (x, y, w, h) to represent the bounding box. x, y are the coordinates of the center of the box, and w, h are the width and height of the box.
Red box P is the predicted box, and green box G is the ground truth box.
We need to find a transformation that maps the predicted box P to a box G’ which is closer to the ground truth box G. Find a mapping function: \((G'x, G'y, G'h, G'w) = f(Px, Py, Ph, Pw), where (G'x, G'y, G'h, G'w) \approx (Gx, Gy, Gh, Gw)\)

So given predicted box P and ground truth box G, we need to learn the (dx, dy, dw, dh) to minimize the loss function.

Convolutional Layers: Extract features from the input image.
Region Proposal Network (RPN): Generate region proposals (bounding boxes) for objects in the image.