Preconditioned Conjugage Gradient Method

AMath 585, Winter Quarter 2020 at the University of Washington. Developed by R.J. LeVeque and distributed under the BSD license. You are free to modify and use as you please, with attribution.

These notebooks are all available on Github.


This notebook corrects some errors in Section 4.3.5 of the text, and includes some more discussion of PCG methods.

For an example implementation and application, see DarcyFlow.html.

Correction of errors

The preconditioned conjugate gradient method on p. 95 of the text contains some errors.

The lines defining $\alpha_{k-1}$ and $\beta_k-1$ are incorrect and should read:

$$ \alpha_{k-1} = (z_{k-1}^T r_{k-1}) / (p_{k-1}^T w_{k-1}) \quad\text{($z$ instead of $r$ in the numerator)} $$

and $$ \beta_{k-1} = (z_k^T r_k) / (z_{k-1}^T r_{k-1}) \quad\text{($z$ instead of $r$ in two places)} $$

There are other typos in describing the relation between variables earlier on the page. The correct expressions are:

$$ w_k = C^T \tilde w_k \qquad\text{and}\qquad r_k = C^T(\tilde f - \tilde A \tilde u_k) = f - A u_k. $$

Main idea

The main idea of how the preconditioner is applied is described in the text in more detail, but will be summarized again here.

We wish to solve the linear system

$$ Au = f $$

in a case where $A$ is symmetric positive (or negative) definite so that the conjugate gradient algorithm can be used, but is ill-conditioned, in which case CG might not converge as quickly as we would like.

We choose $M$ as some approximation to $A$ for which it is cheap to solve systems of the form

$$ Mz = r, $$

and we solve such a system in each iteration and then use $z$ in place of $r$ in several places.

Then the preconditioned C-G (PCG) algorithm is equivalent to applying CG to a different system that has the same solution as the original problem, but with condition number given by that of $M^{-1}A$ rather than the condition number of $A$. The better $M$ approximates $A$, the smaller this condition number will be (and the faster the algorithm converges).

We must also choose $M$ to be symmetric positive definite. This is because the PCG algorithm is actually derived by applying CG to the modified system

$$ (C^{-T}AC^{-1})(Cu) = C^{-T}f, \qquad\text{ or}\quad \tilde A \tilde u = \tilde f $$

where $C$ is a matrix satisfying $C^TC = M$. Since $C^TC$ is always symmetric positive definite (provided $C$ is nonsingular), we are implicitly assuming that $M$ is.

We apply C-G to this system rather than to $M^{-1}Au = M^{-1}f$ because the matrix $M^{-1}A$ might not be symmetric, even if $A$ and $M$ are, and so C-G would not be applicable.

Moreover, the matrix $\tilde A = C^{-T}AC^{-1}$ is always symmetric positive definite (as long as $A$ is nonsingular) since for any nonzero vector $u$,

$$ u^T\tilde A u = (C^{-1}u)^T A (C^{-1}u) > 0 $$

provided $A$ is SPD.

Also, $\tilde A$ has the same 2-norm condition number as $M^{-1}A$, since they are similar and so have the same eigenvalues. See the text and references for more details.

Cholesky factorization

The matrix $C$ is never actually computed but implicitly exists provided $M$ is SPD. One possible $C$ is obtained by computing the Cholesky decomposition of $M$, which is like the LU factorization of $M$ obtained with Gaussian Elimination, but with a more symmetric form in the SPD case. In this case it can be shown that the diagonal elements of $U$ are always positive, and by symmetry if you factor a matrix $D = \text{diag}(U_{ii})$ out of $U$, you obtain:

$$ LU = LDL^T. $$

Now set $C = D^{1/2}L^T = \text{diag}(\sqrt{U_{ii}})L^T$ and you have $M = LU = C^TC$.

Diagonal preconditioning

One simple (but often effective) preconditioner in which $M$ is the diagonal part of $A$. For a constant coefficient Poisson problem this would be a constant multiple of the identity matrix and so all the eigenvalues would be scaled by the same factor and hence $M^{-1}A$ would have the same condition number as $A$.

So the diagonal preconditioner is of no use for the Poisson problem, but for the variable coefficient problem it can help a lot, particularly if the coefficients vary greatly. This is illustrated in the notebook DarcyFlow.html.

Note that if $A$ is SPD then we should use $M = \text{diag}(A_{ii})$ as the diagonal preconditioner. On the other hand if $A$ is SND (symmetric negative definite) then we should instead use $M = \text{diag}(-A_{ii})$, so that this is a SPD matrix. In this case the resulting $\tilde A$ will still be SND. (Note: It is a theorem that any SPD matrix has positive diagonal elements while any SND matrix has negative diagonal elements.)

In either case we have $M = \text{diag}(|A_{ii}|)$ and the matrix $C$ can be $C = \text{diag}(\sqrt{|A_{ii}|})$.

As a simple example, suppose $A$ is a diagonal matrix with all positive values on the diagonal, and hence is SPD with 2-norm condition number $\max(A_{ii})/\min(A_{ii})$ which could be arbitrarily large depending on our choice of diagonal elements, and so convergence could be quite slow. In this case, diagonal preconditioning would work great: $\tilde A = I$, the identity matrix, and so the PCG algorithm would converge in one iteration (since all the eigenvalues are equal).

In [ ]: