Conjugate-Gradient Method

AMath 585, Winter Quarter 2020 at the University of Washington. Developed by R.J. LeVeque and distributed under the BSD license. You are free to modify and use as you please, with attribution.

These notebooks are all available on Github.


This notebook illustrates an implmentation of the Conjugate-Gradient method on a simple 1-dimensional boundary value problem $u''(x) = f(x)$ with Dirichlet boundary conditions. This illustrates the basic algorithm, although for this particular problem it would of course be better to solve the tridiagonal system directly.

In [1]:
%matplotlib inline
In [2]:
from pylab import *

Animating the iteration

Here we allow two possible ways to animate figures. The version defined in the module jsanimate_figs.py creates an animation that works well even when the notebook is rendered as html (by embedding javascript into the notebook, hence the js). The code to do this is a bit long so it has been put in a separate file, but it is based on the matplotlib animation tools, in particular FuncAnimation to loop over figures and the to_jshtml function to convert to javascript.

The version using ipywidgets does not render in the html version but you may prefer using it if you are running the notebook live, since it executes more quickly.

To use widgets, set use_widgets = True.

In [3]:
use_widgets = False

if use_widgets:
    from ipywidgets import interact
    import ipywidgets as widgets

    def animate_figs(figs):
        show_frame = lambda frameno: display(figs[frameno])
        interact(show_frame, frameno=widgets.IntSlider(min=0,max=len(figs)-1, value=0))
else:
    from jsanimate_figs import animate_figs

Define the BVP

Again we solve $u''(x) = f(x)$ with Dirichlet boundary conditions and choose a problem where the truncation error is zero so that the exact solution of the linear system is the solution of the ODE evaluated at the grid points.

Discrete matrix

In the notebook IterativeMethods.html we solved the system

\begin{align*} \frac{1}{h^2} \left[ \begin{array}{ccccccccccccccc} 1&0\\ 1&-2&1\\ &1&-2&1\\ &&&\ddots\\ &&&&1&-2&1\\ &&&&&0&1 \end{array} \right] ~ \left[ \begin{array}{ccccccccccccccc} U_0 \\ U_1 \\ U_2 \\ \vdots \\ U_m \\ U_{m+1} \end{array} \right] = \left[ \begin{array}{ccccccccccccccc} \alpha \\ f(x_1) \\ f(x_2) \\ \vdots \\ f(x_m) \\ \beta \end{array} \right]. \end{align*}

But note that this matrix is not symmetric and hence the conjugate gradient method cannot be used, or at least it will not work for general initial data. It turns out if the initial data satisfies the Dirichlet boundary conditions then it does in fact work, because the first and last equations are always satisfied, but to avoid concerns that this is affecting the behavior we will instead solve the tridiagonal system for the interior unknowns,

\begin{align*} \frac{1}{h^2} \left[ \begin{array}{ccccccccccccccc} -2&1\\ 1&-2&1\\ &&&\ddots\\ &&&&1&-2&1\\ &&&&&1&-2 \end{array} \right] ~ \left[ \begin{array}{ccccccccccccccc} U_1 \\ U_2 \\ \vdots \\ U_{m-1} \\ U_m \end{array} \right] = \left[ \begin{array}{ccccccccccccccc} f(x_1) - \alpha/h^2 \\ f(x_2) \\ \vdots \\ f(x_{m-1}) \\ f(x_m) - \beta/h^2 \end{array} \right]. \end{align*}

This complicates things a bit since we generally want to plot the full solution so we will use U_full to be the full set of grid points, including boundary points, and U to be the solution on the interior points, the actual solution to the linear system, with similar notation for other variables.

Note: This matrix $A$ is symmetric negative definite rather than positive definite. We could negate both the matrix and right hand side to make it positive definite, but this isn't necessary: the C-G algorithm as written in the text still works fine.

Also note that the first matrix above, in addition to being non-symmetric, has two positive eigenvalues equal to 1 and the rest of the eigenvalues are negative. So even if it were symmetric, C-G would not work without negating some of the equations.

Define the matrix-vector multiply function

To implement the C-G algorithm we need to be able to multiply $Av$ for various vectors $v$. The following matvec routine computes and returns this product. This is the only place the matrix $A$ is needed, so not that in general we do not need to form or store the large sparse matrix.

In [4]:
def matvec(v):
    """
    Given v of length m 
    Return b = A*v
    """
    
    m = len(v)
    h = 1./(m+1)
    
    # initialize vector for product:
    b = empty(m, dtype=float)
    
    # set each value of b, noting that first and last rows are special:
    for i in range(m):
        if i>0:
            vim = v[i-1]
        else:
            vim = 0.
        if i<m-1:
            vip = v[i+1]
        else:
            vip = 0.
            
        b[i] = (vim - 2*v[i] + vip) / h**2
        
    return b

Different version: extend v to include boundary points

Here's a simpler way to define the matvec function that is more like what is done in the notebook ConjugateGradient2D.html.

In [5]:
def matvec(v):
    """
    Given v of length m 
    Return b = A*v
    """
    
    # pad v with zeros at each boundary, needed for computing centered differences:

    m = len(v)
    h = 1./(m+1)
    v_full = zeros(m+2)
    v_full[1:-1] = v
    
    # compute b of length m at interior points only. 
    # Note that slicing with [1:-1] corresponds to interior points,
    # [:-2] correspond to points to the left and 
    # [2:] are points to the right.
    
    b = (v_full[:-2] - 2*v_full[1:-1] + v_full[2:]) / h**2
    
    return b

The Conjugate-Gradient algorithm

The next function implements the C-G algorithm.

In [6]:
def solve_bvp_CG(f_fcn, utrue_fcn, m, maxiter, kplot, verbose=False):
    
    h = 1./(m+1)
    x_full = linspace(0,1,m+2)
    x = x_full[1:m+1] # interior points

    utrue_full = utrue_fcn(x_full)
    utrue = utrue_full[1:m+1]  # at interior points
    
    # Dirichlet boundary values from true solution:
    alpha = utrue_fcn(0.)
    beta = utrue_fcn(1.)

    f_full = f_fcn(x_full)

    # right-hand side:
    f = f_full[1:m+1]  # at interior points

    # adjust for Dirichlet BCs:
    f[0] = f[0] - alpha / h**2
    f[m-1] = f[m-1] - beta / h**2

    # initial guess:
    U0_full = linspace(alpha, beta, m+2)  # linear
    U0 = U0_full[1:m+1]  # interior points

    U = U0.copy() # current iterate
    r = f - matvec(U)  # initial residual
    p = r.copy()  # initial direction

    tol = 1e-8  # stop if the residual 

    enorm = abs(U-utrue).max()
    errors = [enorm]
    figs = []  # for the list of figures we generate

    rTr_km = dot(r,r)  # r^T * r at iteration k-1

    for k in range(1,maxiter+1):
        w = matvec(p)   # the only matrix-vector multiply
        a = rTr_km / dot(p,w) # alpha_{k-1} in CG algorithm
        U = U + a*p
        r = r - a*w

        enorm = abs(U-utrue).max()
        errors.append(enorm)

        if mod(k,kplot)==0 or k==maxiter:
            # every kplot iterations create a plot:
            fig = figure(figsize=(12,5))
            plot(x_full,U0_full,'r-o', label='initial guess')
            plot(x_full,utrue_full,'k-o', label='true solution')
            U_full = hstack([alpha,U,beta])
            plot(x_full,U_full,'bo-', label= 'iteration k = %i' % k)
            legend()
            grid(True)
            xlim(0,1)
            ylim(0,3)
            title('After %i iterations, norm(error) = %.2e' \
                  % (k, enorm))
            figs.append(fig)
            close(fig)

        rTr_k = dot(r,r)
        rnorm = sqrt(rTr_k)
        
        if verbose:
            print('iteration %3i:   2-norm(r) = %.2e,   max-norm(E) = %.2e' \
                  % (k,rnorm,enorm))
        
        if rnorm < tol:
            if verbose: print('Stopping after %i iterations' % k)
            break

        # determine next search direction:
        b = rTr_k / rTr_km   # beta_{k-1} in CG algorithm
        rTr_km = rTr_k       # for next iteration
        p = r + b*p           # next search direction
         
    return errors, figs

Test this on a cubic function

Again we choose an example for which the exact solution of the linear system is just the ODE solution evaluated at the grid points...

In [7]:
f_fcn = lambda x: 6*x + 2
utrue_fcn = lambda x: x**3 + x**2 - x + 1
In [8]:
errors,figs = solve_bvp_CG(f_fcn, utrue_fcn, m=19, maxiter=25, kplot=1, verbose=True)
iteration   1:   2-norm(r) = 6.14e+01,   max-norm(E) = 5.26e-01
iteration   2:   2-norm(r) = 5.51e+01,   max-norm(E) = 4.34e-01
iteration   3:   2-norm(r) = 4.89e+01,   max-norm(E) = 3.49e-01
iteration   4:   2-norm(r) = 4.27e+01,   max-norm(E) = 2.75e-01
iteration   5:   2-norm(r) = 3.66e+01,   max-norm(E) = 2.11e-01
iteration   6:   2-norm(r) = 3.06e+01,   max-norm(E) = 1.54e-01
iteration   7:   2-norm(r) = 2.46e+01,   max-norm(E) = 1.08e-01
iteration   8:   2-norm(r) = 1.87e+01,   max-norm(E) = 7.84e-02
iteration   9:   2-norm(r) = 1.27e+01,   max-norm(E) = 6.52e-02
iteration  10:   2-norm(r) = 9.73e+00,   max-norm(E) = 5.65e-02
iteration  11:   2-norm(r) = 9.75e+00,   max-norm(E) = 4.12e-02
iteration  12:   2-norm(r) = 7.69e+00,   max-norm(E) = 2.87e-02
iteration  13:   2-norm(r) = 5.87e+00,   max-norm(E) = 1.94e-02
iteration  14:   2-norm(r) = 4.29e+00,   max-norm(E) = 1.19e-02
iteration  15:   2-norm(r) = 2.96e+00,   max-norm(E) = 7.06e-03
iteration  16:   2-norm(r) = 1.86e+00,   max-norm(E) = 3.53e-03
iteration  17:   2-norm(r) = 1.01e+00,   max-norm(E) = 1.47e-03
iteration  18:   2-norm(r) = 4.07e-01,   max-norm(E) = 4.41e-04
iteration  19:   2-norm(r) = 5.21e-15,   max-norm(E) = 6.66e-16
Stopping after 19 iterations

Note that this converged after 19 iterations, as expected. Note that neither the residual nor the error were very small in iteration 18, and dropped dramatically in the final iteration!

In [9]:
semilogy(range(1,len(errors)+1), errors, 'b-x')
grid(True)

Note the behavior of the approximate solution, now plotted every iteration:

In [10]:
animate_figs(figs)
Out[10]:

A different ODE

Here's an even simpler ODE where the true solution is quadratic. In this case we obtain convergence in 10 iterations, again with a big drop in the final iteration.

In [11]:
f_fcn = lambda x: -6*ones(x.shape)
utrue_fcn = lambda x: 2-3*(x-0.2)*(x-0.7)
In [12]:
errors,figs = solve_bvp_CG(f_fcn, utrue_fcn, m=19, maxiter=25, kplot=1, verbose=True)
iteration   1:   2-norm(r) = 7.62e+01,   max-norm(E) = 6.07e-01
iteration   2:   2-norm(r) = 6.77e+01,   max-norm(E) = 4.80e-01
iteration   3:   2-norm(r) = 5.92e+01,   max-norm(E) = 3.67e-01
iteration   4:   2-norm(r) = 5.07e+01,   max-norm(E) = 2.70e-01
iteration   5:   2-norm(r) = 4.22e+01,   max-norm(E) = 1.87e-01
iteration   6:   2-norm(r) = 3.37e+01,   max-norm(E) = 1.20e-01
iteration   7:   2-norm(r) = 2.51e+01,   max-norm(E) = 6.75e-02
iteration   8:   2-norm(r) = 1.64e+01,   max-norm(E) = 3.00e-02
iteration   9:   2-norm(r) = 7.35e+00,   max-norm(E) = 7.50e-03
iteration  10:   2-norm(r) = 2.99e-13,   max-norm(E) = 4.44e-16
Stopping after 10 iterations
In [13]:
semilogy(range(1,len(errors)+1), errors, 'b-x')
grid(True)

The behavior of the iterates is particularly simple in this case:

In [14]:
animate_figs(figs)
Out[14]: