# Optimization of the GP Model

The negative log-likelihood, $\mathcal{L}_\beta$ is dependent upon several components, namely the mean and variance estimators $\hat{\mu}(\beta)$ and $\hat{\sigma}^2(\beta)$, as well as the inverse and determinant of $R_\delta$, all of which are dependent upon the hyper-parameter $\beta$. Recall that each $\beta= (\beta_1,...,\beta_d)$ is a $(1 \times d)$ vector of parameters, with individual components $\beta_i$, for all $i=\{1,…d\}$. Additionally, the nugget parameter $\delta$ depends on the condition number $\kappa(R)$, which again is dependent upon $\beta$. For this reason, it is difficult, if not impossible, to extract analytic gradient information from $\mathcal{L_\beta}$. It follows that optimization methods that rely on the user providing an accurate expression for $\nabla \mathcal{L}_\beta$ are of no benefit. We can, however, provide numerical approximations to $\nabla \mathcal{L}_\beta$ through finite differencing, as is performed in the BFGS and Implicit Filtering (IF) algorithms. Such methods do not rely on accurately computing the gradient of the objective function and are known as derivative-free optimization algorithms.

Even when using derivative-free optimization techniques, the optimization process remains challenging. The objective function, $\mathcal{L_\beta}$, is often very rough around the global optimum and can contain numerous local optima and flat regions. It is not uncommon for the likelihood value at these sub-optimal solutions to be close in value to the that of the global optimum. However, the corresponding $\beta$ parameterization of these local optima can vary significantly, resulting in a poor-quality model fit. To ensure that the quality of the GP model is reliable, convergence to a highly precise global optimum is crucial, and thus a highly accurate and robust global optimization technique is required.

For a list of optimization algorithms please navigate to categories->Optimization Algorithms.