In this paper, we propose a novel information theoretic approach to obtain compact and discriminative dictionary of visual data. This approach squeezes discriminative information from the dictionary for efficient representation using information bottleneck. The dictionary is optimized from the initial sparse dictionary, which is learned from action data. In this, a constraint information optimization problem is formulated in which mutual information between the initial and optimized dictionary is minimized while maximizing mutual information between optimized dictionary and class labels. We use an effective similarity measure, Jensen-Shannon divergence with adaptive weightages, for class distributions of each dictionary atom. These adaptive weightages are obtained based on the usage of the dictionary atom among different classes. The resultant dictionary becomes discriminative and compact, while retaining maximum information with fewer atoms. Using simple reconstruction error, we test computational efficiency of the proposed method without compromising classification accuracy on popular benchmark datasets. It is further demonstrated how efficiently discriminative information is retained by comparing the classification performance of the dictionary before and after the removal of redundant dictionary atoms. 1520-9210 © 2017 IEEE.