The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is the most commonly used update strategy for implementing a Quasi-Newtown optimization technique. Newton’s method was first derived as a numerical technique for solving for the roots of a nonlinear equation. Newton’s Method solves for the roots of a nonlinear equation by providing a linear approximation to the nonlinear equation at a given guess . The linear approximation, as measured by the tangent to the curve at the point is then used to obtain a new guess to the solution, and the process repeats. Formally, the iterative method is expressed as
In terms of optimization, the ultimate goal is to find a stationary point such that . Therefore, assuming the objective function is twice differentiable, Newton’s Method uses in order calculate the roots of , and the iterative process becomes
This iterative scheme can be generalized to multiple dimensions by replacing and with the gradient, and the Hessian, , respectively. The main advantages of Newton’s method is its robustness and rapid local quadratic rate of convergence. Unfortunately, this method requires computation and storage of the Hessian matrix and solving the resulting system of equations, which can be costly.
The most rudimentary Quasi-Newton method is the one dimensional secant method, which approximates with a finite differencing scheme
The BFGS quasi-Newton approach can therefore be thought of as a generalization of the secant method.
The general form of a quasi-Newton optimization is, given a point , we attempt to solve for the descent direction by approximating the Hessian matrix at $x_k$. We can then obtain a search direction by solving
Like Newton’s Method, we follow along this search direction to obtain a new guess
where is the step-size, often determined through a scalar optimization or a line search procedure.
The simplest and most cost efficient generalized quasi-Newton method is the method of steepest descent. This method assumes is equal to the identity matrix I for all k, and therefore does not require costly computation and storage of . As a result, the descent direction simplifies to , and is that of “most” descent. This simplification, however, decreases the rate of convergence such that the method of steepest descent converges linearly.
For optimization of , we use Matlab’s built-in unconstrained optimization algorithm fminunc. fminunc uses a quasi-Newton optimization technique and includes multiple options for tackling a wide variety of optimization problems. For convenience, we will describe fminunc in terms of the options that we have selected for our specific optimization problem.
fminunc uses the BFGS method to update the Hessian matrix at each point . The BFGS quasi-Newton algorithm can be summarized by the following steps:
1. Specify an initial and .
2. For k=0,1,2,…
a)Stop if is optimal
b) Solve for search direction .
c) Use a line search to determine the step-size .
d) Update .
e) Compute using the BFGS update.
We have chosen to use the medium-scale implementation of fminunc, in which the user must supply an initial value for x0. The user, however, does not need to supply the initial Hessian matrix as by default under the medium-scale implementation. Therefore, the first iteration of fminunc follows the method of steepest descent. We define as being optimal if , where is the magnitude of the gradient of the likelihood function, . The gradient function, , is derived using a forward finite differencing scheme. If is not optimal, then we proceed to solve for the search direction . A mixed quadratic/cubic line search procedure (described below) is used to determine an appropriate step-size , which allows us to update to our new position .
The BFGS update formula falls into a broader class of rank two update formulas that maintains both symmetry and positive definiteness (SPD) of the Hessian approximation. Preservation of SPD ensures that the search direction is always a descent direction. The general rank two update formula is given by
where is a scalar and
The BFGS Hessian update is obtained by setting , and therefore simplifies to
LINE SEARCH PROCEDURE
A mixed quadratic/cubic line search procedure is used in order to determine a suitable step-size . The line search procedure operates by first perturbing the initial point , along the line of steepest descent in order to obtain a second point . The two points along with the initial gradient approximation are used to obtain a quadratic interpolation. A third point, , is obtained through extrapolation of the quadratic interpolant, and brackets the minimum such that . Using each of , for and the first gradient approximation, allows us to obtain a cubic interpolant, whose minimum, , estimates the true minimum. We then proceed to bisect the bracket interval using two of along with the minimum estimate to obtain a new cubic interpolant, and the process repeats.