# Motivation

In order to familiarize the reader with the methods that I will use to apply mathematics (particularly optimization) to everyday problems, I feel it somewhat necessary to provide a few posts that summarize my current research, (methods, techniques etc.) so that future posts don’t appear so peculiar . Here we go:

Computer simulators are useful tools for modelling the complexity of real world systems which are either impractical, too expensive or too time consuming to physically observe. For example, analysis of large physical systems such as the flow of oil in a reservoir, or the tidal energy generated by the tides of large ocean basins, can be achieved through the use of computer simulators. Suppose we wish to optimize the position of injector and production wells in an oil field, or similarly the positions of the tidal turbines in an ocean basin. Strategic planning of such procedures is crucial in order to maximize the net output of energy. Computer simulators, therefore, enable scientists and engineers to experiment with a numerical replica of such complex problems in order to determine the most effective and efficient solution(s).

This being said, realistic computer simulators can be computationally expensive to run, and are often emulated using statistical models, such as Gaussian Process (GP) models. Many mathematical, physical, engineering and geological processes employ the use of GP models as statistical surrogates to replace computationally expensive computer simulators. For example, computer simulations of the elastic deformation of cylinders resulting from high velocity impacts, the manufacturing and integration processes of electrical circuits for quality measures, or the estimation of the magnetic field generated near the Milky Way, can be computationally expensive and overly time consuming. Therefore, in each case a GP model was used as an efficient way to obtain an approximation to the corresponding simulator output.

The GP model requires as input a set of experimental design points and their corresponding simulator outputs. The simulator output is required to estimate the mean and variance of the Gaussian Process z(x), explained further in my next post, whereas the design points are used for calculating the spatial correlation matrix, R. There are many choices for the correlation matrix R, however, we will use the squared exponential correlation function. The components of R depend on both the spatial distribution of the design points, as well as a scaling hyper-parameter beta. In order for the GP model to predict the simulator output with a high level of accuracy we must determine the corresponding optimal beta values.

The maximum likelihood approach (which is the actual optimization problem at hang) determines the hyper-parameter beta that will minimize the deviance in our model estimates, as measured by the mean squared error (MSE). The quality of the GP model is quantified by a likelihood function, $\mathcal{L_\beta}$, which is the objective function that we seek to optimize. Gradient information of the likelihood function cannot be easily obtained and the likelihood surface can often have multiple local optima, making the optimization problem challenging. Derivative-free optimization techniques such as the genetic algorithm (GA) are robust, but can be computationally intensive. Gradient approximation methods, such as BFGS, are generally more efficient, but have the potential to converge to a local optimum if poorly initialized. I common technique when implementing local optimizers is a multi-start algorithm, which essentially initiates a local optimizer at multiple points within our search domain, allowing for a more globalized search to be performed. Nonetheless, this method demands several full executions of BFGS, which is still computationally expensive.

In order to improve the efficiency of the likelihood optimization, we investigate the use of three new derivative-free optimization techniques. The primary technique is a hybridization of two algorithms: Dividing Rectangles (DIRECT), a partitioning algorithm, and Implicit Filtering (IF), a general pattern search algorithm that incorporates the use of a linear interpolant from which descent direction information is derived. The second technique replaces IF with BFGS in the DIRECT hybridization, which has its advantageous as the BFGS algorithm does not require bound constraints, whereas both DIRECT and IF are constrained optimizers. The third technique modifies a former clustering based multi-start BFGS method, by replacing BFGS with IF, and introducing a function evaluation budget for a reduced number of clustered centres (0.5 X dimension of the problem). We will compare the performance of the three techniques with that of the originally proposed genetic algorithm, and the multi-start BFGS method employed using 2 X dimension +1 starting points. Analysis of both the likelihood value and the resulting MSE will determine the algorithm’s ability to provide an accurate model fit. Efficiency will be measured by both the time and number of function evaluations required for convergence.