Tiled based CMP (TCMP) has become the essential next generation scalable multicore architecture for both computer and embedded systems. The cores in TCMP commonly share a large sized Last Level Cache (LLC). It has been observed that in most of the TCMP architectures the LLC is not being utilised properly. Enhancing the LLC utilisation can reduce the miss rate of the system. NUCA is used in LLC to divide it into multiple banks such that each bank can be accessed independently. Dynamic NUCA (DNUCA) based TCMP can distribute the loads of heavily used banks with other lightly used banks. Also the frequently used blocks can be migrated to the nearest possible bank in order to reduce the cache access time. Such exibility in DNUCA improves the inter-bank (global) utilisation. But the local utilisation of each bank is still not uniform. The external block distribution and migration cannot improve the local utilisation of banks. In this paper we propose a DNUCA based architecture for TCMP, called BL-DNUCA, having both global and local utilisation enhancement capability. The sets in each bank of BL-DNUCA are used uniformly and a heavily used set can use the idle ways of other sets. Experimental analysis found that BL-DNUCA gives 7.6% improvements in terms of cycle per instructions (CPI) as compared to an existing DNUCA based TCMP (T-DNUCA). © 2016 ACM.