Recently, deep neural networks (DNNs) are widely used for many artificial intelligence (AI) applications, including robotics, self driving vehicles etc. Although DNNs deliver state-of-the-art accuracy on many AI tasks, but are both computationally and memory intensive. This hinders the deployment of DNNs on mobile and IoT edge devices with limited hardware resources and power budgets. Traditional DNN compression is applicable during training to obtain an efficient inference engine. When the inference engine runs on the hardware platform, constrained by battery backup it brings additional challenge in terms of reducing the complexity (like memory requirement, area on hardware etc.). To reduce the memory complexity, we are proposing a new low complex methodology named as "Clustering Algorithm"to eliminate the redundancies present within the filter coefficients (i.e. weights). This algorithm is a three stage pipeline: quantization, coefficient clustering and code-assignment, that work together to reduce the memory storage of neural networks on the cost of low run-time memory requirement. To show the efficacy of the proposed "Clustering Algorithm", we executed it on the network obtained after applying NetAdapt Algorithm, a popular inference engine on pre-trained AlexNet for CIFAR-10 dataset. We observe that our "Clustering Algorithm"reduces the storage required by five layers of already pruned pre-trained AlexNet by 3x (approximately 65%) on FPGA, and 2x (approximately 51%) on CPU. This allows the model to fit into on-chip SRAM cache rather than off-chip DRAM memory. © 2022 IEEE.