Least Squares Fitting
A mathematical procedure for finding the best-fitting curve to a given set of points by minimizing the sum of the squares of the offsets ("the residuals") of the points from the curve. The sum of the squares of the offsets is used instead of the offset absolute values because this allows the residuals to be treated as a continuous differentiable quantity. However, because squares of the offsets are used, outlying points can have a disproportionate effect on the fit, a property which may or may not be desirable depending on the problem at hand.
In practice, the vertical offsets from a line (polynomial, surface, hyperplane, etc.) are almost always minimized instead of the perpendicular
offsets. This provides a fitting function for the independent variable
that estimates
for a given
(most often what
an experimenter wants), allows uncertainties of the data points along the
- and
-axes to be incorporated
simply, and also provides a much simpler analytic form for the fitting parameters
than would be obtained using a fit based on perpendicular
offsets. In addition, the fitting technique can be easily generalized from a
best-fit line to a best-fit polynomial
when sums of vertical distances are used. In any case, for a reasonable number of
noisy data points, the difference between vertical and perpendicular fits is quite
small.
The linear least squares fitting technique is the simplest and most commonly applied form of linear regression and provides
a solution to the problem of finding the best fitting straight line through
a set of points. In fact, if the functional relationship between the two quantities
being graphed is known to within additive or multiplicative constants, it is common
practice to transform the data in such a way that the resulting line is a
straight line, say by plotting
vs.
instead
of
vs.
in the case of
analyzing the period
of a pendulum as
a function of its length
. For this reason,
standard forms for exponential,
logarithmic, and power
laws are often explicitly computed. The formulas for linear least squares fitting
were independently derived by Gauss and Legendre.
For nonlinear least squares fitting to a number of unknown parameters, linear least squares fitting may be applied iteratively to a linearized form of the function until convergence is achieved. However, it is often also possible to linearize a nonlinear function at the outset and still use linear methods for determining fit parameters without resorting to iterative procedures. This approach does commonly violate the implicit assumption that the distribution of errors is normal, but often still gives acceptable results using normal equations, a pseudoinverse, etc. Depending on the type of fit and initial parameters chosen, the nonlinear fit may have good or poor convergence properties. If uncertainties (in the most general case, error ellipses) are given for the points, points can be weighted differently in order to give the high-quality points more weight.
Vertical least squares fitting proceeds by finding the sum of the squares of the vertical deviations
of a set of
data points
|
(1)
|
from a function
. Note that this procedure does not
minimize the actual deviations from the line (which would be measured perpendicular
to the given function). In addition, although the unsquared sum of distances
might seem a more appropriate quantity to minimize, use of the absolute value results
in discontinuous derivatives which cannot be treated analytically. The square deviations
from each point are therefore summed, and the resulting residual is then minimized
to find the best fit line. This procedure results in outlying points being given
disproportionately large weighting.
The condition for
to be a minimum is that
|
(2)
|
for
, ...,
. For a linear fit,
|
(3)
|
so
|
(4)
|
|
(5)
|
|
(6)
|
These lead to the equations
|
(7)
| |||
|
(8)
|
In matrix form,
|
(9)
|
so
|
(10)
|
The
matrix
inverse is
|
(11)
|
so
|
(12)
| |||
|
(13)
| |||
|
(14)
| |||
|
(15)
|
(Kenney and Keeping 1962). These can be rewritten in a simpler form by defining the sums of squares
|
(16)
| |||
|
(17)
| |||
|
(18)
| |||
|
(19)
| |||
|
(20)
| |||
|
(21)
|
which are also written as
|
(22)
| |||
|
(23)
| |||
|
(24)
|
Here,
is the covariance
and
and
are variances.
Note that the quantities
and
can also be interpreted
as the dot products
|
(25)
| |||
|
(26)
|
In terms of the sums of squares, the regression coefficient
is given by
|
(27)
|
and
is given in terms of
using (◇)
as
|
(28)
|
The overall quality of the fit is then parameterized in terms of a quantity known as the correlation coefficient, defined by
|
(29)
|
which gives the proportion of
which is
accounted for by the regression.
Let
be the vertical coordinate of the
best-fit line with
-coordinate
, so
|
(30)
|
then the error between the actual vertical point
and the fitted
point is given by
|
(31)
|
Now define
as an estimator for the variance in
,
|
(32)
|
Then
can be given by
![]() |
(33)
|
(Acton 1966, pp. 32-35; Gonick and Smith 1993, pp. 202-204).
The standard errors for
and
are
![]() |
(34)
| ||
|
(35)
|


linear fit 2, -4,
8, 1, 9, 4, 5, 2, 0




