BFGS – Gradient Approximation Methods

The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is the most commonly used update strategy for implementing a Quasi-Newtown optimization technique. Newton’s method was first derived as a numerical technique for solving for the roots of a nonlinear equation. Newton’s Method solves for the roots of a nonlinear equation by providing a linear approximation to the nonlinear equation at a given guess x_k. The linear approximation, as measured by the tangent to the curve at the point x_k is then used to obtain a new guess to the solution, and the process repeats. Formally, the iterative method is expressed as
In terms of optimization, the ultimate goal is to find a stationary point x^* such that f'(x^*)=0. Therefore, assuming the objective function is twice differentiable, Newton’s Method uses f'' in order calculate the roots of f', and the iterative process becomes
x_{k+1} = x_k + \frac{f'(x_k)}{f''(x_k)}

This iterative scheme can be generalized to multiple dimensions by replacing f'(x) and f''(x) with the gradient, \nabla f(x) and the Hessian, \nabla^2 f(x), respectively. The main advantages of Newton’s method is its robustness and rapid local quadratic rate of convergence. Unfortunately, this method requires computation and storage of the Hessian matrix \nabla^2 f(x) and solving the resulting system of equations, which can be costly.

The most rudimentary Quasi-Newton method is the one dimensional secant method, which approximates f''(x_k) with a finite differencing scheme
f'(x_k) \approx \frac{f'(x_k)-f'(x_{k-1})}{x_k-x_{k-1}}. The BFGS quasi-Newton approach can therefore be thought of as a generalization of the secant method.

The general form of a quasi-Newton optimization is, given a point x_k, we attempt to solve for the descent direction p_k by approximating the Hessian matrix \beta_k at $x_k$. We can then obtain a search direction p_k by solving
\beta_k p_k = -\nabla f(x_k) \ \text{for} \ k=0,1,...  . Like Newton’s Method, we follow along this search direction to obtain a new guess
x_{k+1}=x_k+\alpha_k p_k, where \alpha_k is the step-size, often determined through a scalar optimization or a line search procedure.

The simplest and most cost efficient generalized quasi-Newton method is the method of steepest descent. This method assumes \beta_k is equal to the identity matrix I for all k, and therefore does not require costly computation and storage of \beta_k. As a result, the descent direction simplifies to p_k = -\nabla f(x_k), and is that of “most” descent. This simplification, however, decreases the rate of convergence such that the method of steepest descent converges linearly.

For optimization of \mathcal{L_\beta}, we use Matlab’s built-in unconstrained optimization algorithm fminunc. fminunc uses a quasi-Newton optimization technique and includes multiple options for tackling a wide variety of optimization problems. For convenience, we will describe fminunc in terms of the options that we have selected for our specific optimization problem.

fminunc uses the BFGS method to update the Hessian matrix \beta_k at each point x_k. The BFGS quasi-Newton algorithm can be summarized by the following steps:

1. Specify an initial \beta_0 and x_0.
2. For k=0,1,2,…
a)Stop if x_k is optimal
b) Solve \beta_k p_k = -\nabla f(x_k) for search direction p_k.
c) Use a line search to determine the step-size \alpha_k.
d) Update x_{k+1} = x_k +\alpha_k p_k.
e) Compute \beta_{k+1} using the BFGS update.

We have chosen to use the medium-scale implementation of fminunc, in which the user must supply an initial value for x0. The user, however, does not need to supply the initial Hessian matrix \beta_0 as by default \beta_0 = I under the medium-scale implementation. Therefore, the first iteration of fminunc follows the method of steepest descent. We define x_k as being optimal if |\nabla f(x_k)| < 10^{-6}, where |\nabla f(x_k)| is the magnitude of the gradient of the likelihood function, |\nabla \mathcal{L_\beta}|. The gradient function, \nabla f(x_k), is derived using a forward finite differencing scheme. If x_k is not optimal, then we proceed to solve for the search direction p_k. A mixed quadratic/cubic line search procedure (described below) is used to determine an appropriate step-size \alpha_k, which allows us to update to our new position x_{k+1}.

The BFGS update formula falls into a broader class of rank two update formulas that maintains both symmetry and positive definiteness (SPD) of the Hessian approximation. Preservation of SPD ensures that the search direction is always a descent direction. The general rank two update formula is given by
\beta_{k+1} = \beta_k - \frac{(\beta_k s_k)(\beta_k s_k)^T}{s_k^T \beta_k s_k} + \frac{y_k y_k^T}{y_k^Ts_k} + \phi(s_k^T\beta_k s_k)v_k v_k^T,
where \phi is a scalar and
s_k = x_{k+1}-x_{k}, \quad  y_k = \nabla f(x_{k+1})- \nabla f(x_k), \quad v_k =\frac{ y_k}{y_k^T s_k}-\frac{\beta_k s_k}{s_k^T \beta_k s_k}.
The BFGS Hessian update is obtained by setting \phi = 0, and therefore simplifies to
\beta_{k+1} = \beta_k - \frac{(\beta_k s_k)(\beta_k s_k)^T}{s_k^T \beta_k s_k} + \frac{y_k y_k^T}{y_k^Ts_k}.


A mixed quadratic/cubic line search procedure is used in order to determine a suitable step-size \alpha_k. The line search procedure operates by first perturbing the initial point \lambda_1, along the line of steepest descent in order to obtain a second point \lambda_2. The two points along with the initial gradient approximation are used to obtain a quadratic interpolation. A third point, \lambda_3, is obtained through extrapolation of the quadratic interpolant, and brackets the minimum such that \lambda_1 < \lambda_2 < \lambda_3. Using each of \lambda _{i}, for i=\{1,2,3\}, and the first gradient approximation, allows us to obtain a cubic interpolant, whose minimum, \hat{\lambda}, estimates the true minimum. We then proceed to bisect the bracket interval using two of \lambda_{i} along with the minimum estimate \hat{\lambda} to obtain a new cubic interpolant, and the process repeats.


I am a recent (2012) Memorial University B.Sc. graduate in Applied Mathematics and Physics. I am currently working in the Mathematics & Statistics Department at MUN, under the supervision of Dr. Ronald Haynes. My current research is in numerical optimization, as it pertains to oil well placement and optimal parameterization. In my spare time I enjoy exercising, spending time with family and friends and playing soft rock and blues guitar.

Posted in Optimization Algorithms

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: