Nonlinear Least Squares Fitting

Given a function f(x) of a variable x tabulated at m values y_1=f(x_1), ..., y_m=f(x_m), assume the function is of known analytic form depending on n parameters f(x;lambda_1,...,lambda_n), and consider the overdetermined set of m equations


We desire to solve these equations to obtain the values lambda_1, ..., lambda_n which best satisfy this system of equations. Pick an initial guess for the lambda_i and then define


Now obtain a linearized estimate for the changes dlambda_i needed to reduce dbeta_i to 0,


for i=1, ..., m, where lambda=(lambda_1,...,lambda_n). This can be written in component form as


where A is the m×n matrix

 A_(ij)=[(partialf)/(partiallambda_1)|_(x_1,lambda) ... (partialf)/(partiallambda_n)|_(x_1,lambda); (partialf)/(partiallambda_1)|_(x_2,lambda) ... (partialf)/(partiallambda_n)|_(x_2,lambda); | ... |; (partialf)/(partiallambda_1)|_(x_m,lambda) ... (partialf)/(partiallambda_n)|_(x_m,lambda)].

In more concise matrix form,


where dbeta is an m-vector and dlambda is an n-vector.

Applying the transpose of A to both sides gives




in terms of the known quantities A and dbeta then gives the matrix equation


which can be solved for dlambda using standard matrix techniques such as Gaussian elimination. This offset is then applied to lambda and a new dbeta is calculated. By iteratively applying this procedure until the elements of dlambda become smaller than some prescribed limit, a solution is obtained. Note that the procedure may not converge very well for some functions and also that convergence is often greatly improved by picking initial values close to the best-fit value. The sum of square residuals is given by R^2=dbeta·dbeta after the final iteration.


An example of a nonlinear least squares fit to a noisy Gaussian function


is shown above, where the thin solid curve is the initial guess, the dotted curves are intermediate iterations, and the heavy solid curve is the fit to which the solution converges. The actual parameters are (A,x_0,sigma)=(1,20,5), the initial guess was (0.8, 15, 4), and the converged values are (1.03105, 20.1369, 4.86022), with R^2=0.148461. The partial derivatives used to construct the matrix A are


The technique could obviously be generalized to multiple Gaussians, to include slopes, etc., although the convergence properties generally worsen as the number of free parameters is increased.

An analogous technique can be used to solve an overdetermined set of equations. This problem might, for example, arise when solving for the best-fit Euler angles corresponding to a noisy rotation matrix, in which case there are three unknown angles, but nine correlated matrix elements. In such a case, write the n different functions as f_i(lambda_1,...,lambda_n) for i=1, ..., n, call their actual values y_i, and define

 A=[(partialf_1)/(partiallambda_1)|_(lambda_i) (partialf_1)/(partiallambda_2)|_(lambda_i) ... (partialf_1)/(partiallambda_n)|_(lambda_i); | | ... |; (partialf_m)/(partiallambda_1)|_(lambda_i) (partialf_m)/(partiallambda_2)|_(lambda_i) ... (partialf_m)/(partiallambda_n)|_(lambda_i)],



where lambda_i are the numerical values obtained after the ith iteration. Again, set up the equations as


and proceed exactly as before.

See also

Least Squares Fitting, Linear Regression, Moore-Penrose Matrix Inverse

Explore with Wolfram|Alpha


Bates, D. M. and Watts, D. G. Nonlinear Regression and Its Applications. New York: Wiley, 1988.

Referenced on Wolfram|Alpha

Nonlinear Least Squares Fitting

Cite this as:

Weisstein, Eric W. "Nonlinear Least Squares Fitting." From MathWorld--A Wolfram Web Resource.

Subject classifications