Deep learning Processor Units (DPUs) from Xilinx are design-time configurable CNN accelerators for FPGAs. We propose EXPRESS, which predicts the execution time of any given CNN on a DPU. EXPRESS incorporates the effect of bus connections into prediction. As a DPU is invoked by a host CPU to process a CNN layer by layer, EXPRESS considers the CPU and the DPU execution time for predicting the end-to-end processing time. EXPRESS has an average prediction error of 2.2% and significantly outperforms state-of-the-art. © 2022 IEEE.