Drill the cork of information bottleneck by inputting the most important data

Abstract

Deep learning has become the most powerful machine learning tool in the last decade. However, how to efficiently train deep neural networks remains to be thoroughly solved. The widely used minibatch stochastic gradient descent (SGD) still needs to be accelerated. As a promising tool to better understand the learning dynamic of minibatch SGD, the information bottleneck (IB) theory claims that the optimization process consists of an initial fitting phase and the following compression phase. Based on this principle, we further study typicality sampling, an efficient data selection method, and propose a new explanation of how it helps accelerate the training process of the deep networks. We show that the fitting phase depicted in the IB theory will be boosted with a high signal-to-noise ratio of gradient approximation if the typicality sampling is appropriately adopted. Furthermore, this finding also implies that the prior information of the training set is critical to the optimization process, and the better use of the most important data can help the information flow through the bottleneck faster. Both theoretical analysis and experimental results on synthetic and real-world datasets demonstrate our conclusions.

Publication
IEEE Transactions on Neural Networks and Learning Systems
(中科院1区; JCR Q1区; 影响因子=14.255)

下面为本研究的简介,详细内容请参阅论文原文

avatar

Citation

If you find our work is useful in your research, please consider citing:

@ARTICLE{9439803,
  author={Peng, Xinyu and Zhang, Jiawei and Wang, Fei-Yue and Li, Li},
  journal={IEEE Transactions on Neural Networks and Learning Systems}, 
  title={Drill the Cork of Information Bottleneck by Inputting the Most Important Data}, 
  year={2022},
  volume={33},
  number={11},
  pages={6360-6372},
  doi={10.1109/TNNLS.2021.3079112}
}
张嘉玮
张嘉玮
清华大学博士研究生

我的研究兴趣包括自动驾驶,智能汽车,智能交通系统,深度强化学习.