Knowledge Distillation for Multi-task Learning

University of Edinburgh

pipeline picture — Figure 1. We first train a task-specific model for each task in an offline stage and freeze their parameters (fig. (a) and (c)); then optimize the parameters of the multi-task network for task-specific losses and also for producing the same features with the single-task networks (fig. (b)). As each task-specific network computes different features, we introduce small task-specific adaptors that map multi-task features to the task-specific one’s. The adaptors align the features of the single-task and multi-task, and enables a balanced parameter sharing across multiple tasks.

Multi-task learning (MTL) is to learn one single model that performs multiple tasks for achieving good performance on all tasks and lower cost on computation. Learning such a model requires to jointly optimize losses of a set of tasks with different difficulty levels, magnitudes,and characteristics (e.g. cross-entropy, Euclidean loss), leading to the imbalance problem in multi-task learning. To address the imbalance problem, we propose a knowledge distillation based method in this work. We first learn a task-specific model for each task. We then learn the multi-task model for minimizing task-specific loss and for producing the same feature with task-specific models. As the task-specific network encodes different features, we introduce small task-specific adaptors to project multi-task features to the task-specific features. In this way, the adaptors align the task-specific feature and the multi-task feature, which enables a balanced parameter sharing across tasks.

▸ Paper

▸ Code

Please let me know if any questions or suggestions.