Analysis of mixture of experts on individual, sparsely-annotated and multi-dataset object detection

Loading...
Thumbnail Image
Author(s)
Jain, Yash
Advisor(s)
Editor(s)
Associated Organization(s)
Organizational Unit
Supplementary to:
Abstract
Learning with multiple datasets is a solution to develop a universal vision system. By using multiple datasets, each with different properties and label sets, we can build a robust system. However, datasets from different sources may focus on different object classes due to incompatible label schema. We imitate this mixing problem by creating synthetic sparsely-annotated splits of COCO and VOC datasets which have annotations of a set of classes removed. These splits also evaluate the missing annotation noise that is often found in large-scale annotated datasets. Ensemble learning can be an optimal solution to this problem but it requires a lot of compute for individual models. In this work, we build upon an architectural solution of ensemble learning, Mixture-of-Experts (MoE). MoE is a combination of a router and multiple feed-forward experts which takes in incoming tokens, routes them to appropriate experts, process them and consolidates them for next pass. Traditional MoE use load balancing loss to ensure expert utilization, we build over it by using cross-entropy objective conditioned over dataset source. We call this approach MoE-Xent. We evaluate MoE at backbone or decoder position and study its effect w.r.t. dataset complexity. With MoE as backbone, we achieve a mAP50 score of 63.6(+0.7) on VOC and 66.7(+0.3) on KITTI dataset over Swin-v2 baseline. On sparsely-annotated VOC, we achieve an average improvement of 2.5 in AP score with 25% sparsity-level. Further experiments reveal that MoE as backbone performs poorly on COCO due to shallow input features to MoE on complex datasets. Hence, we move MoE to decoder in end-to-end DINO object- detection pipeline and achieved a 49.7(+0.8) mAP score on COCO and 39.7(+2.3) mAP on COCO-minitrain datasets. Finally, we discuss the sensitivity of MoE training and suggest hyperparameter changes that can avoid their representation collapse.
Sponsor
Date
2023-05-01
Extent
Resource Type
Text
Resource Subtype
Thesis
Rights Statement
Rights URI