Forget less, count better: a domain-incremental self-distillation learning benchmark for lifelong crowd counting
Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai 200433, China; Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai 200433, China; Shanghai Center for Brain Science and Brain-inspired Technology, Shanghai 201210, China; School of Information Science and Technology, Xiamen University, Xiamen 361005, China; College of Information Sciences and Technology, the Pennsylvania State University, University Park, PA 16802, USA; State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; less
has important applications in public safety and pandemic control. A robust and practical system has to be capable of continuously learning with the newly incoming domain data in real-world scenarios instead of fitting one domain only. Off-the-shelf methods have some drawbacks when handling multiple domains: (1) the models will achieve limited performance (even drop dramatically) among old domains after training images from new domains due to the discrepancies in intrinsic data distributions from various domains, which is called catastrophic forgetting; (2) the well-trained model in a specific domain achieves imperfect performance among other unseen domains because of domain shift; (3) it leads to linearly increasing storage overhead, either mixing all the data for training or simply training dozens of separate models for different domains when new ones are available. To overcome these issues, we investigate a new task in incremental domain training setting called lifelong . Its goal is to alleviate catastrophic forgetting and improve the generalization ability using a single model updated by the incremental domains. Specifically, we propose a self-distillation learning framework as a benchmark (forget less, count better, or FLCB) for lifelong , which helps the model leverage previous meaningful knowledge in a sustainable manner for better to mitigate the forgetting when new data arrive. A new quantitative metric, normalized Backward Transfer (nBwT), is developed to evaluate the forgetting degree of the model in the process. Extensive experimental results demonstrate the superiority of our proposed benchmark in achieving a low catastrophic forgetting degree and strong generalization ability.