Suppose that you have N
data points, yk
, for k = 1,2,...,N
,
and the function to be fitted is f(x,p)
, where p
represents the M
parameters <p1,p2,...,pM>
. Define the likelihood of
the parameters, given the data, as the probability of the data, given the parameters. We fit for the
parameters, p
, by finding those values, pmin
that
maximize this likelihood. This form of parameter estimation is known as maximum likelihood estimation.
Good references on this topic include:
Consider the likelihood function
,
where P is the
probability density, which depends on the random variable x and the parameters
(p)≡P(x1,p)*P(x2,p)*...*P(xN,p)
p
. is a measure for the
probability to observe just the particular sample we have, and is called
an a-posteriori probability since it is computed after the sampling
is done. The best estimates for
p
are the values which
maximize . But maximizing the logarithm of
also maximizes
, and
maximizing
ln(
is equivalent to minimizing
)
-ln(
. So, the goal becomes minimizing the log likelihood function:)
Let p0
be the initial values given for p
.
The goal is to find a ∇p
so that p1 = p0+∇p
is a better approximation to the data. We use the iterative Gauss-Newton method, and the series
p1,p2,p3,...
will hopefully converge to the minimum,
pmin
.
Generally, the Gauss-Newton method is locally convergent when χ2 is
zero at the minimum. Serious difficulties arise when f
is sufficiently
nonlinear and χ2 is large at the minimum. The Gauss-Newton method has the
advantage that linear least squares problems are solved in one iteration.
Consider the Taylor expansion of L(p)
:
Define the arrays ,
and
:
If we linearize, i.e., assume that
Note: The partial derivatives are approximated numerically using a central difference approximation: