• GPU tensor cores for fast arithmetic reductions 

    Navarro, Cristóbal A.; Carrasco, Roberto; Barrientos, Ricardo ORCID; Riquelme, Javier A.; Vega, Raimundo (2021)
    This article proposes a parallel algorithm for computing the arithmetic reduction of $n$ numbers as a set of matrix-multiply accumulate (MMA) operations that are executed simultaneously by GPU tensor cores. The analysis, ...