TY - JOUR
T1 - MACnet Mask augmented counting network for class-agnostic counting
AU - McCarthy, Tadhg
AU - Virtusio, John Jethro
AU - Ople, Jose Jaena Mari
AU - Tan, Daniel Stanley
AU - Amalin, Divina
AU - Hua, Kai Lung
PY - 2023/5
Y1 - 2023/5
N2 - Class agnostic counting involves counting the instances of any user-defined class. It is also usually phrased as a matching problem wherein the model finds all the matching objects in a query image given exemplar patches containing the target object. Typically, users define exemplar patches by placing bounding boxes around the target object. However, defining exemplars using bounding boxes inevitably captures both the target object (foreground) and its surrounding background. This would unintentionally match patches similar to the background, leading to an inaccurate count. Moreover, objects poorly represented by a bounding box (e.g., non-axis aligned, irregular, or non-rectangular shapes) may capture a significantly disproportionate amount of background relative to the foreground within the exemplar patch, leading to poor matches. In this paper, we propose to utilize segmentation masks to separate target objects from their background. We derived these segmentation masks from extreme points, which requires no additional annotation effort from the user compared to annotating bounding boxes. Moreover, we designed a module that learns the mask features as residual to the object features, allowing the network to learn how to better incorporate the segmentation masks. Our model improves upon state-of-the-art methods by up to 3.7 MAE points on the FSC-147 benchmark dataset, showing the effectiveness of our masking approach.
AB - Class agnostic counting involves counting the instances of any user-defined class. It is also usually phrased as a matching problem wherein the model finds all the matching objects in a query image given exemplar patches containing the target object. Typically, users define exemplar patches by placing bounding boxes around the target object. However, defining exemplars using bounding boxes inevitably captures both the target object (foreground) and its surrounding background. This would unintentionally match patches similar to the background, leading to an inaccurate count. Moreover, objects poorly represented by a bounding box (e.g., non-axis aligned, irregular, or non-rectangular shapes) may capture a significantly disproportionate amount of background relative to the foreground within the exemplar patch, leading to poor matches. In this paper, we propose to utilize segmentation masks to separate target objects from their background. We derived these segmentation masks from extreme points, which requires no additional annotation effort from the user compared to annotating bounding boxes. Moreover, we designed a module that learns the mask features as residual to the object features, allowing the network to learn how to better incorporate the segmentation masks. Our model improves upon state-of-the-art methods by up to 3.7 MAE points on the FSC-147 benchmark dataset, showing the effectiveness of our masking approach.
KW - Class-agnostic counting
KW - Extreme points
KW - Segmentation masks
U2 - 10.1016/j.patrec.2023.03.017
DO - 10.1016/j.patrec.2023.03.017
M3 - Article
AN - SCOPUS:85152117770
SN - 0167-8655
VL - 169
SP - 75
EP - 80
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
ER -