BLAS is an acronym for Basic Linear Algebra Subroutines. As the name indicates, it contains subprograms for basic operations on vectors and matrices. BLAS was designed to be used as a building block in other codes, for example LAPACK. The source code for BLAS is available through Netlib. However, many computer vendors will have a special version of BLAS tuned for maximal speed and efficiency on their computer. This is one of the main advantages of BLAS: the calling sequences are standardized so that programs that call BLAS will work on any computer that has BLAS installed. If you have a fast version of BLAS, you will also get high performance on all programs that call BLAS. Hence BLAS provides a simple and portable way to achieve high performance for calculations involving linear algebra.

LAPACK is a higher-level package built on the same ideas.

The BLAS subroutines can be divided into three *levels*:

- Level 1:
- Vector-vector operations.
*O(n)*data and*O(n)*work. - Level 2:
- Matrix-vector operations.
*O(n^2)*data and*O(n^2)*work. - Level 3:
- Matrix-matrix operations.
*O(n^2)*data and*O(n^3)*work.

Each BLAS and LAPACK routine comes in several versions, one for each precision (data type). The first letter of the subprogram name indicates the precision used:

S Real single precision. D Real double precision. C Complex single precision. Z Complex double precision.

Complex double precision is not strictly defined in FORTRAN 77, but most compilers will accept one of the following declarations:

double complexlist-of-variablescomplex*16list-of-variables

Some of the BLAS 1 subprograms are:

- xCOPY - copy one vector to another
- xSWAP - swap two vectors
- xSCAL - scale a vector by a constant
- xAXPY - add a multiple of one vector to another
- xDOT - inner product
- xASUM - 1-norm of a vector
- xNRM2 - 2-norm of a vector
- IxAMAX - find maximal entry in a vector

The first letter (x) can be any of the letters S,D,C,Z depending on the precision. A quick reference to BLAS 1 can be found at http://www.netlib.org/blas/blasqr.ps

Some of the BLAS 2 subprograms are:

- xGEMV - general matrix-vector multiplication
- xGER - general rank-1 update
- xSYR2 - symmetric rank-2 update
- xTRSV - solve a triangular system of equations

A detailed description of BLAS 2 can be found at http://www.netlib.org/blas/blas2-paper.ps.

Some of the BLAS 3 subprograms are:

- xGEMM - general matrix-matrix multiplication
- xSYMM - symmetric matrix-matrix multiplication
- xSYRK - symmetric rank-k update
- xSYR2K - symmetric rank-2k update

The more advanced matrix operations, like solving a linear system of equations, are contained in LAPACK. A detailed description of BLAS 3 can be found at http://www.netlib.org/blas/blas3-paper.ps.

Let us first look at a very simple BLAS routine, SSCAL. The call sequence is:

call SSCAL ( n, a, x, incx )

Here *x* is the vector, *n* is the length (number of
elements in *x* we wish to use), and *a* is the scalar
by which we want to multiply *x*.
The last argument *incx* is the *increment*.
Usually, *incx=1* and the vector *x* corresponds directly
to the one-dimensional Fortran array *x*.
For *incx>1* it specifies how many elements in the array we
should "jump" between each element of the vector *x*.
For example, if *incx=2* it means we should only scale every other
element (note: the physical dimension of the array *x* should then be
at least *2n-1*). Consider these examples where *x* has been
declared as `real x(100)`.

call SSCAL(100, a, x, 1) call SSCAL( 50, a, x(50), 1) call SSCAL( 50, a, x(2), 2)

The first line will scale all 100 elements of *x* by *a*.
The next line will only scale the last 50 elements of *x* by *a*.
The last line will scale all the even indices of *x* by *a*.

Observe that the array *x* will be overwritten by the new values.
If you need to preserve a copy of the old *x*, you have to
make a copy first, e.g., by using SCOPY.

Now consider a more complicated example. Suppose you
have two 2-dimensional arrays A and B, and you are asked to
find the *(i,j)* entry of the product A*B. This is
easily done by computing the inner product of row *i* from A and
column *j* of B. We can use the BLAS 1 subroutine SDOT. The only
difficulty is to figure out the correct indices and increments.
The call sequence for SDOT is:

SDOT ( n, x, incx, y, incy )

Suppose the array declarations were:

real A(lda,lda) real B(ldb,ldb)

But in the program you know that the actual size of A is *m*p*
and for B it is *p*n*. The *i*'th row of A starts at
the element *A(i,1)*. But since Fortran stores 2-dimensional
arrays down columns, the next row element *A(i,2)* will
be stored *lda* elements later in memory (since *lda* is the
length of a column). Hence we set *incx = lda*.
For the column in B there is no such problem, the elements are
stored consecutively so *incy = 1*. The length of the
inner product calculation is *p*. Hence the answer is:

SDOT ( p, A(i,1), lda, B(1,j), 1 )

First of all you should check if you already have BLAS on your system. If not, you can find it on Netlib at http://www.netlib.org/blas.

The BLAS routines are almost self-explanatory. Once you know which routine you need, fetch it and read the header section that explains the input and output parameters in detail. We will look at an example in the next section when we address the LAPACK routines.

*Copyright © 1995-7 by Stanford University. All rights reserved.*