O) Increasing Per Thread Work 012 TILE WIDTH-1 Each thread computes two element of pda, Reduced loads from global memory(Md)to shared memory Reduced instruction overhead More work done in each iteration TILE WIDT TIPE WIIDT: IAE WIDB TIIPE WDTRIncreasing Per Thread Work 8