(Georgia Institute of Technology, 2009-08-26)
Rajamanickam, Siva
With the success of Basic Linear Algebra Subroutines (BLAS) in using the memory
efficiently, the algorithms with vector operations (BLAS2) have given way to
algorithms with matrix operations (BLAS3). In some cases, BLAS3 based algorithms
are successful even with the cost of doing additional floating point operations and
using additional memory. In this talk, I will talk about two problems where
algorithms with vector operations when combined with blocking can perform better
than BLAS3 based algorithms.
Band reduction methods are mainly used in computing the eigen value decomposition
and singular value decomposition of band matrices. In the first part of this talk,
I will outline a blocking scheme for plane rotations. The blocked plane rotations
when coupled with a pipelining scheme leads to fewer floating point operations and
memory usage than the BLAS3 based band reduction methods. The blocked method is
also able to extract the same performance benefits from the cache as the BLAS3
based methods leading to a faster band reduction method. I will also show how we
can exploit the zeros while finding the eigen and singular vectors.
In the second part of the talk, I will introduce a method for computing the
bidiagonalization of a sparse upper triangular matrix R. In this method, we exploit
the sparsity of R and use plane rotations to reduce it to the bidiagonal form. We
choose the rotations to minimize the fill generated in R itself. I will show how to
extend this method to use dynamic blocking and the pipelining scheme to arrive at
an efficient R-bidiagonalization method for computing the sparse SVD.