Header menu link for other important links
X
INFER: INterFerence-Aware Estimation of Runtime for Concurrent CNN Execution on DPUs
S. Goel, , M. Balakrishnan, R. Sen
Published in Institute of Electrical and Electronics Engineers Inc.
2020
Pages: 66 - 71
Abstract
Deep Learning Processor Unit (DPU) from XILINX is among the numerous accelerators that have been proposed to speed up the execution of Convolutional Neural Networks (CNNs) on embedded platforms. DPUs are available in different configurable sizes and can execute any given CNN. Neural network researchers are also rapidly bringing out newer CNN algorithms with improved performance (typically higher prediction accuracy) with a trade-off in size or energy consumption for embedded applications. To enable quick evaluation of choices among evolving CNN algorithms and accelerator configurations, we propose INFER (INterFerence-Aware Estimation of Runtime). INFER is a framework to estimate the execution time of any CNN on a given size of DPU without actual implementation. Further, current FPGA platforms are capable of implementing multiple DPUs whereas many applications consist of multiple sub-Tasks with each requiring separate and/or different CNNs. In such scenarios of concurrent use of multiple DPUs on an FPGA, INFER is also capable of estimating the additional time taken for execution due to the sharing of memory bandwidth. Our evaluation on various mixes of 16 standard CNNs and eight configurations of DPU shows that INFER has an average prediction error of 6.6%, which can be useful for design space exploration as well as scheduling in multi-DPU platforms. © 2020 IEEE.