Implementing the m-MIMO detectors and precoders requires a series of matrix-vector multiplications. These multiplications typically have iterative forms which are suitable for serial implementation. Serial implementations impose a significant delay into the system, impacting the system's latency and making implementations a problem. This paper considers a recently proposed linear m-MIMO detector and presents an efficient parallel technique to compute the detector's estimation vector. We implement the computation on the Nvidia Tesla T4 graphics processing unit (GPU) with compute unified device architecture (CUDA) application programming interface using Google colab. Numerical and implementation results are presented to quantity the speedup of the proposed parallel algorithm's runtime as compared to existing parallel algorithms. © 2022 IEEE.