# Implicit Filtering – Thank You C.T. Kelley

Implicit Filtering (IF) (designed by C.T. Kelley) is a sophisticated, deterministic pattern search method for bound constrained optimization. Like most pattern search algorithms, the evaluation of the objective function at time step i is used to determine the next cluster of points for the following time step i+1. IF, however, uses a combination of sampling, thus exploring the search space by evaluating the objective function at multiple feasible points, and interpolation, which uses a first-order interpolant constructed via linear least squares, and a quasi-Newton Hessian approximation to infer descent directions.

The sampling method is arranged on a stencil, which, given a current iterate $x_c$, will sample the objective function in all coordinate directions via $x_c \pm hv_i$. Here $v_i = (L_i-U_i)e_i$, where $L_i$ and $U_i$ represent the components of the lower and upper bounds of the feasible region, respectively, and $e_i$ is the unit coordinate vector. The scale, h varies as the optimization progresses, according to $\{2^{-n}\}_{n=1}^7$. Therefore, if a sampling phase is unsuccessful in finding a more optimal position than the current incumbent point, the scale, and thus the stencil, is reduced by a factor of 0.5, and the sampling phase is repeated. By default, the optimization will terminate after the scale has been reduced by a factor of $2^{-7}$. It is, however, the quasi-Newton iteration that provides IF with a clear advantage over a general pattern search algorithm, and has shown to improve overall efficiency and ability of global convergence.

In terms of algorithm structure, IF can be broken into two main loops: the outer loop, which controls the pattern search, and the inner loop, which implements the quasi-Newton iteration at the current best position, $x_{min}$, determined at the end of each sampling step. Therefore, IF must be given two main stopping criterion; one for each loop.

The quasi-Newton iteration will terminate when the norm of the difference gradient $||\nabla f(x,v,h)|| \leq \tau h$. Here, $\tau = \xi *\epsilon$, where $\xi$ is a scaling factor that attempts to eliminate discrepancies in the step size of the quasi-Newton iteration, which arise when the values of the objective function are either very small or very large. The default scale value is $\xi= 1.2|f(x_0)|$, where $x_0$ is the user-supplied initial starting point. $\epsilon$ is the tolerance level for terminating the quasi-Newton iteration. Highly sensitive stopping tolerances are required for optimization of $\mathcal{L_\beta}$ as the likelihood function contains many regions that are relatively flat, which can result in a premature termination of the optimization process if the stopping tolerance is too large.

Kelley states that IF works efficiently for functions which are noisy, non-smooth or random. Therefore in theory, IF should be well suited for optimization of our particular objective function, $\mathcal{L_\beta}$. The general thought is that the sampling phase, with a suitable step size, should step over the local minima, whereas the quasi-Newton iteration will establish efficient convergence in regions near the global optimum. For more information on IF’s options and specifications, please refer to Implicit Filtering by C.T. Kelley.