Research and Advances
Computing Applications Review articles

An Elementary Introduction to Kalman Filtering

Demystifying the uses of a powerful tool for uncertain information.
Posted
  1. Introduction
  2. Key Insights
  3. Formalizing Estimates
  4. Fusing Scalar Estimates
  5. Fusing Vector Estimates
  6. Best Linear Unbiased Estimator (BLUE)
  7. Kalman Filters for Linear Systems
  8. Extension to Nonlinear Systems
  9. Conclusion
  10. Acknowledgments
  11. References
  12. Authors
  13. Footnotes
noisy data signal, illustration

Kalman filtering is a state estimation technique used in many application areas such as spacecraft navigation, motion planning in robotics, signal processing, and wireless sensor networks because of its ability to extract useful information from noisy data and its small computational and memory requirements.12,20,27,28,29 Recent work has used Kalman filtering in controllers for computer systems.5,13,14,23

Back to Top

Key Insights

  • This article presents an elementary derivation of Kalman filtering, a classic state estimation technique.
  • Understanding Kalman filtering is useful for more principled control of computer systems.
  • Kalman filtering is used as a black box by rnany computer scientists.

Although many introductions to Kalman filtering are available in the literature,1,2,3,4,6,7,8,9,10,11,17,21,25,29 they are usually focused on particular applications such as robot motion or state estimation in linear systems, making it difficult to see how to apply Kalman filtering to other problems. Other presentations derive Kalman filtering as an application of Bayesian inference, assuming that noise is Gaussian. This leads to the common misconception that Kalman filtering can be applied only if noise is Gaussian.15

Abstractly, Kalman filtering can be seen as a particular approach to combining approximations of an unknown value to produce a better approximation. Suppose we use two devices of different designs to measure the temperature of a CPU core. Because devices are usually noisy, the measurements are likely to differ from the actual temperature of the core. As the devices are of different designs, let us assume that noise affects the two devices in unrelated ways (this is formalized here using the notion of correlation). Therefore, the measurements x1 and x2 are likely to be different from each other and from the actual core temperature xc. A natural question is the following: is there a way to combine the information in the noisy measurements x1 and x2 to obtain a good approximation of the actual temperature xc?

One ad hoc solution is to use the formula 0.5*x1+0.5*x2 to take the average of the two measurements, giving them equal weight. Formulas of this sort are called linear estimators because they use a weighted sum to fuse values; for our temperature problem, their general form is β*x1+α*x2. In this presentation, we use the term estimate to refer to both a noisy measurement and a value computed by an estimator, as both are approximations of unknown values of interest.

Suppose we have additional information about the two devices, say the second one uses more advanced temperature sensing. Because we would have more confidence in the second measurement, it seems reasonable that we should discard the first one, which is equivalent to using the linear estimator 0.0*x1 + 1.0*x2. Kalman filtering tells us that in general, this intuitively reasonable linear estimator is not “optimal;” paradoxically, there is useful information even in the measurement from the lower quality device, and the optimal estimator is one in which the weight given to each measurement is proportional to the confidence we have in the device producing that measurement. Only if we have no confidence whatever in the first device should we discard its measurement.

The goal of this articlea is to present the abstract concepts behind Kalman filtering in a way that is accessible to most computer scientists while clarifying the key assumptions, and then show how the problem of state estimation in linear systems can be solved as an application of these general concepts. First, the informal ideas discussed here are formalized using the notions of distributions and random samples from distributions. Confidence in estimates is quantified using the variances and covariances of these distributions.b Two algorithms are described next. The first one shows how to fuse estimates (such as core temperature measurements) optimally, given a reasonable definition of optimality. The second algorithm addresses a problem that arises frequently in practice: estimates are vectors (for example, the position and velocity of a robot), but only a part of the vector can be measured directly; in such a situation, how can an estimate of the entire vector be obtained from an estimate of just a part of that vector? The best linear unbiased estimator (BLUE) is used to solve this problem.16,19,26 It is shown that the Kalman filter can be derived in a straightforward way by using these two algorithms to solve the problem of state estimation in linear systems. The extended Kalman filter and unscented Kalman filter, which extended Kalman filtering to nonlinear systems, are described briefly at the end of the article.

Back to Top

Formalizing Estimates

Scalar estimates. To model the behavior of devices producing noisy temperature measurements, we associate each device i with a random variable that has a probability density function (pdf) pi(x) such as the ones shown in Figure 1 (the x-axis in this figure represents temperature). Random variables need not be Gaussian.c Obtaining a measurement from device i corresponds to drawing a random sample from the distribution for that device. We write cacm6211_a.gif to denote that xi is a random variable with pdf pi whose mean and variance are μi and cacm6211_b.gif , respectively; following convention, we use xi to represent a random sample from this distribution as well.

f1.jpg
Figure 1. Using pdfs to model devices with systematic and random errors. Ground truth is 60°C. Dashed lines are means of pdfs.

Means and variances of distributions model different kinds of inaccuracies in measurements. Device i is said to have a systematic error or bias in its measurements if the mean μi of its distribution is not equal to the actual temperature xc (in general, to the value being estimated, which is known as ground truth); otherwise, the instrument is unbiased. Figure 1 shows pdfs for two devices that have different amounts of systematic error. The variance cacm6211_b.gif on the other hand is a measure of the random error in the measurements. The impact of random errors can be mitigated by taking many measurements with a given device and averaging their values, but this approach will not reduce systematic error.

In the formulation of Kalman filtering, it is assumed that measuring devices do not have systematic errors. However, we do not have the luxury of taking many measurements of a given state, so we must take into account the impact of random error on a single measurement. Therefore, confidence in a device is modeled formally by the variance of the distribution associated with that device; the smaller the variance, the higher our confidence in the measurements made by the device. In Figure 1, the fact we have less confidence in the first device has been illustrated by making p1 more spread out than p2, giving it a larger variance.

The informal notion that noise should affect the two devices in “unrelated ways” is formalized by requiring that the corresponding random variables be uncorrelated. This is a weaker condition than requiring them to be independent, as explained in our online appendix (http://dl.acm.org/citation.cfm?doid=3363294&picked=formats). Suppose we are given the measurement made by one of the devices (say x1) and we have to guess what the other measurement (x2) might be. If knowing x1 does not give us any new information about what x2 might be, the random variables are independent. This is expressed formally by the equation p(x2|x1) = p(x2); intuitively, knowing the value of x1 does not change the pdf for the possible values of x2. If the random variables are only uncor-related, knowing x1 might give us new information about x2 such as restricting its possible values but the mean of x2|x1 will still be μ2. Using expectations, this can be written as E[x2|x1] = E[x2], which is equivalent to requiring that E[(x1μ1)(x2μ2)], the covariance between the two variables, be equal to zero. This is obviously a weaker condition than independence.

Although the discussion in this section has focused on measurements, the same formalization can be used for estimates produced by an estimator. Lemma shows how the mean and variance of a linear combination of pairwise uncorrelated random variables can be computed from the means and variances of the random variables.18 The mean and variance can be used to quantify bias and random errors for the estimator as in the case of measurements.

An unbiased estimator is one whose mean is equal to the unknown value being estimated and it is preferable to a biased estimator with the same variance. Only unbiased estimators are considered in this article. Furthermore, an unbiased estimator with a smaller variance is preferable to one with a larger variance as we would have more confidence in the estimates it produces. As a step toward generalizing this discussion to estimators that produce vector estimates, we refer to the variance of an unbiased scalar estimator as the mean square error of that estimator or MSE for short.

Lemma 1(ii) asserts that if a random variable is pairwise uncorrelated with a set of random variables, it is uncor-related with any linear combination of those variables.

Lemma 1. Let cacm6211_c.gif be a set of pairwise uncorrelated random variables. Let cacm6211_d.gif be a random variable that is a linear combination of the xi‘s.

  1. The mean and variance of y are:

eq01.gif

eq02.gif

  1. ii. If random variable xn+1 is pair-wise uncorrelated with x1,…,xn, it is uncorrelated with y.

Vector estimates. In some applications, estimates are vectors. For example, the state of a mobile robot might be represented by a vector containing its position and velocity. Similarly, the vital signs of a person might be represented by a vector containing his temperature, pulse rate, and blood pressure. Here, we denote a vector by a boldfaced lowercase letter, and a matrix by an uppercase letter.

The covariance matrix Σxx of a random variable x is the matrix E[(xμx) (xμx)T], where μx is the mean of x. Intuitively, entry (i,j) of this matrix is the covariance between the i and j components of vector x; in particular, entry (i,i) is the variance of the ith component of x. A random variable x with a pdf p whose mean is μx and covariance matrix is Σxx is written as x~p(μx, Σxx). The inverse of the covariance matrix cacm6211_e.gif is called the precision or information matrix.

Uncorrelated random variables. The cross-covariance matrix Σvw of two random variables v and w is the matrix E[(vμv)(wμw)T]. Intuitively, element (i,j) of this matrix is the covariance between elements v(i) and w(j). If the random variables are uncorrelated, all entries in this matrix are zero, which is equivalent to saying that every component of v is uncorrelated with every component of w. Lemma 2 generalizes Lemma 1.

Lemma 2. Let x1~p1(μ1, Σ1), …, xn~pn(μn, Σn) be a set of pairwise uncorrelated random variables of length m. Let cacm6211_f.gif .

  1. The mean and covariance matrix of y are the following:

eq03.gif

eq04.gif

  1. ii. If random variable xn+1 is pairwise uncorrelated with x1, .., xn, it is uncorrelated with y.

The MSE of an unbiased estimator y is E[(yμy)T(yμy)], which is the sum of the variances of the components of y; if y has length 1, this reduces to variance as expected. The MSE is also the sum of the diagonal elements of Σyy (this is called the trace of Σyy).

Back to Top

Fusing Scalar Estimates

We now consider the problem of choosing the optimal values of the parameters α and β in the linear estimator β*x1 + α*x2 for fusing two estimates x1 and x2 from uncorrelated scalar-valued random variables.


An unbiased estimator is one whose mean is equal to the unknown value being estimated and it is preferable to a biased estimator with the same variance.


The first reasonable requirement is that if the two estimates x1 and x2 are equal, fusing them should produce the same value. This implies that α+β =1. Therefore, the linear estimators of interest are of the form

eq05.gif

If x1 and x2 in Equation 5 are considered to be unbiased estimators of some quantity of interest, then yα is an unbiased estimator for any value of α. How should optimality of such an estimator be defined? One reasonable definition is that the optimal value of α minimizes the variance of yα as this will produce the highest-confidence fused estimates.

Theorem 1. Let cacm6211_g.gif and cacm6211_h.gif be uncorrelated random variables. Consider the linear estimator yα(x1,x2)=(1-α)*x1+α*x2. The variance of the estimator is minimized for cacm6211_i.gif .

The proof is straightforward and is given in the online appendix. The variance (MSE) of yα can be determined from Lemma 1:

eq06.gif

Setting the derivative of cacm6211_aj.gif with respect to α to zero and solving the resulting equation yield the required result.

In the literature, the optimal value of α is called the Kalman gain K. Substituting K into the linear fusion model, we get the optimal linear estimator y(x1, x2):

eq07.gif

As a step toward fusion of n>2 estimates, it is useful to rewrite this as follows:

eq08.gif

Substituting the optimal value of α into Equation 6, we get

eq09.gif

The expressions for y and cacm6211_j.gif are complicated because they contain the reciprocals of variances. If we let v1 and v2 denote the precisions of the two distributions, the expressions for y and vy can be written more simply as follows:

eq10.gif

eq11.gif

These results say the weight we should give to an estimate is proportional to the confidence we have in that estimate, and that we have more confidence in the fused estimate than in the individual estimates, which is intuitively reasonable. To use these results, we need only the variances of the distributions. In particular, the pdfs pi, which are usually not available in applications, are not needed, and the proof of Theorem 1 does not require these pdfs to have the same mean.


Kalman filtering can be seen as a particular approach to combining approximations of an unknown value to produce a better approximation.


Fusing multiple scalar estimates. These results can be generalized to optimally fuse multiple pairwise uncorrelated estimates x1, x2, …, xn. Let yn,α(x1, .., xn) denote the linear estimator for fusing the n estimates given parameters α1, .., αn, which we denote by α (the notation yα(x1, x2) introduced previously can be considered to be an abbreviation of y2,α(x1, x2)).

Theorem 2. Let cacm6211_k.gif for (1≤in) be a set of pairwise uncorrelated random variables. Consider the linear estimator cacm6211_l.gif where cacm6211_m.gif . The variance of the estimator is minimized for

ueq01.gif

The minimal variance is given by the following expression:

eq12.gif

As before, these expressions are more intuitive if the variance is replaced with precision: the contribution of xi to the value of yn(x1, .., xn) is proportional to xi‘s confidence.

eq13.gif

eq14.gif

Equations 13 and 14 generalize Equations 10 and 11.

Incremental fusing is optimal. In many applications, the estimates x1, x2, …, xn become available successively over a period of time. Although it is possible to store all the estimates and use Equations 13 and 14 to fuse all the estimates from scratch whenever a new estimate becomes available, it is possible to save both time and storage if one can do this fusion incrementally. We show that just as a sequence of numbers can be added by keeping a running sum and adding the numbers to this running sum one at a time, a sequence of n>2 estimates can be fused by keeping a “running estimate” and fusing estimates from the sequence one at a time into this running estimate without any loss in the quality of the final estimate. In short, we want to show that yn(x1,..,xn)=y2(y2(..y2(x1,x2)…),xn). A little bit of algebra shows that if n>2, Equations 13 and 14 for the optimal linear estimator and its precision can be expressed as shown in Equations 15 and 16.

eq15.gif

eq16.gif

This shows that yn(x1, ..,xn) = y2(yn-1 (x1, .., xn-1), xn). Using this argument recursively gives the required result.d

To make the connection to Kalman filtering, it is useful to derive the same result using a pictorial argument. Figure 2 shows the process of incrementally fusing the n estimates. In this picture, time progresses from left to right, the precision of each estimate is shown in parentheses next to it, and the weights on the edges are the weights from Equation 10. The contribution of each xi to the final value y2(y2(…), xn) is given by the product of the weights on the path from xi to the final value, and this product is obviously equal to the weight of xi in Equation 13, showing that incremental fusion is optimal.

f2.jpg
Figure 2. Dataflow graph for incremental fusion.

Summary. The results in this section can be summarized informally as follows. When using a linear estimator to fuse uncertain scalar estimates, the weight given to each estimate should be inversely proportional to the variance of the random variable from which that estimate is obtained. Furthermore, when fusing n>2 estimates, estimates can be fused incrementally without any loss in the quality of the final result. These results are often expressed formally in terms of the Kalman gain K, as shown in Figure 3; the equations can be applied recursively to fuse multiple estimates. Note that if v1v2, K≈0 and y(x1,x2)≈x1; conversely if v1v2, K≈1 and y(x1,x2)≈x2.

f3.jpg
Figure 3. Optimal fusion of scalar estimates.

Back to Top

Fusing Vector Estimates

The results for fusing scalar estimates can be extended to vectors by replacing variances with covariance matrices.

For vectors, the linear estimator is cacm6211_n.gif where cacm6211_o.gif . Here A stands for the matrix parameters (A1, …, An). All the vectors (xi) are assumed to be of the same length. To simplify notation, we omit the subscript n in yn,A in the discussion here as it is obvious from the context.

Optimality. The parameters A1, …, An in the linear data fusion model are chosen to minimize MSE(yA) which is E[(yAμyA)T(yAμyA)].

Theorem 3 generalizes Theorem 2 to the vector case. The proof of this theorem is given in the appendix. Comparing Theorems 2 and 3, we see that the expressions are similar, the main difference being that the role of variance in the scalar case is played by the covariance matrix in the vector case.

Theorem 3. Let xipi(μi, ∑i) for (1≤in) be a set of pairwise uncorrelated random variables. Consider the linear estimator cacm6211_p.gif where cacm6211_o.gif . The value of MSE(yA) is minimized for

eq17.gif

Therefore the optimal estimator is

eq18.gif

The covariance matrix of y can be computed by using Lemma 2.

eq19.gif

In the vector case, precision is the inverse of a covariance matrix, denoted by N. Equations 26–27 use precision to express the optimal estimator and its variance and generalize Equations 13–14 to the vector case.

eq20.gif

eq21.gif

As in the scalar case, fusion of n>2 vector estimates can be done incrementally without loss of precision. The proof is similar to the scalar case and is omitted.

There are several equivalent expressions for the Kalman gain for the fusion of two estimates. The following one, which is easily derived from Equation 23, is the vector analog of Equation 17:

eq22.gif

The covariance matrix of the optimal estimator y(x1, x2) is the following.

eq23.gif

eq24.gif

Summary. The results in this section can be summarized in terms of the Kalman gain K as shown in Figure 4.

f4.jpg
Figure 4. Optimal fusion of vector estimates.

Back to Top

Best Linear Unbiased Estimator (BLUE)

In some applications, the state of the system is represented by a vector but only part of the state can be measured directly, so it is necessary to estimate the hidden portion of the state corresponding to a measured value of the visible state. This section describes an estimator called the best linear unbiased estimator (BLUE)16,19,26 for doing this.

Consider the general problem of determining a value for vector y given a value for a vector x. If there is a functional relationship between x and y (say y=F(x) and F is given), it is easy to compute y given a value for x (say x1).

In our context, however, x and y are random variables, so such a precise functional relationship will not hold. Figure 5 shows an example in which x and y are scalar-valued random variables. The gray ellipse in this figure, called a confidence ellipse, is a projection of the joint distribution of x and y onto the (x, y) plane that shows where some large proportion of the (x, y) values are likely to be. Suppose x takes the value x1. Even within the confidence ellipse, there are many points (x1, y), so we cannot associate a single value of y with x1. One possibility is to compute the mean of the y values associated with x1 (that is, the expectation E[y|x=x1]) and return this as the estimate for y if x=x1. This requires knowing the joint distribution of x and y, which may not always be available.

f5.jpg
Figure 5. BLUE line corresponding to Equation (31).

In some problems, we can assume that there is an unknown linear relationship between x and y and that uncertainty comes from noise. Therefore, we can use a technique similar to the ordinary least squares (OLS) method to estimate this linear relationship, and use it to return the best estimate of y for any given value of x. In Figure 5, we see that although there are many points (x1, y), the y values are clustered around the line as shown in the figure so the value ŷ1 is a reasonable estimate for the value of y corresponding to x1. This line, called the best linear unbiased estimator (BLUE), is the analog of ordinary least squares (OLS) for distributions.

Computing BLUE. Consider the estimator ŷA,b(x) = Ax + b. We choose A and b so that this is an unbiased estimator with minimal MSE. The “∧” over the y is notation that indicates that we are computing an estimate for y.

Theorem 4. Let

ueq02.gif

The estimator ŷA,b(x) = Ax + b. for estimating the value of y for a given value of x is an unbiased estimator with minimal MSE if

ueq03.gif

ueq04.gif

The proof of Theorem 4 is straight forward. For an unbiased estimator, E[ŷ] = E[y]. This implies that b=μyA(μx) so an unbiased estimator is of the form ŷA(x) = μy + A[x – μx). Note this is equivalent to asserting the BLUE line must pass through the point (μx, μy). Setting the derivative of MSEAA) with respect to A to zero22 and solving for A, we find that the best linear unbiased estimator is

eq25.gif

This equation can be understood intuitively as follows. If we have no information about x and y, the best we can do is the estimate (μx, μy), which lies on the BLUE line. However, if we know that x has a particular value x1, we can use the correlation between y and x to estimate a better value for y from the difference (x1μx). Note that if ∑yx = 0 (that is, x and y are uncorrelated), the best estimate of y is just μy, so knowing the value of x does not give us any additional information about y as one would expect. In Figure 5, this corresponds to the case when the BLUE line is parallel to the x-axis. At the other extreme, suppose that y and x are functionally related so y = Cx. In that case, it is easy to see that Σyx = CΣxx, so ŷ = Cx as expected. In Figure 5, this corresponds to the case when the confidence ellipse shrinks down to the BLUE line.

Equation 31 is a generalization of ordinary least squares in the sense that if we compute the relevant means and variances of a set of discrete data (xi, yi) and substitute into Equation 31, we get the same line that is obtained by using OLS.

Back to Top

Kalman Filters for Linear Systems

We now apply the algorithms for optimal fusion of vector estimates (Figure 4) and the BLUE estimator (Theorem 4) to the particular problem of state estimation in linear systems, which is the classical application of Kalman filtering.

Figure 6a shows how the evolution of the state of such a system over time can be computed if the initial state x0 and the model of the system dynamics are known precisely. Time advances in discrete steps. The state of the system at any time step is a function of the state of the system at the previous time step and the control inputs applied to the system during that interval. This is usually expressed by an equation of the form xt = ft(xt–1, ut) where ut is the control input. The function ft is nonlinear in the general case, and can be different for different steps. If the system is linear, the relation for state evolution over time can be written as xt = Ftxt-1 + Btut, where Ft and Bt are time-dependent matrices that can be determined from the physics of the system. Therefore, if the initial state x0 is known exactly and the system dynamics are modeled perfectly by the Ft and Bt matrices, the evolution of the state over time can be computed precisely as shown in Figure 6a.

f6.jpg
Figure 6. State estimation using Kalman filtering.

In general, however, we may not know the initial state exactly, and the system dynamics and control inputs may not be known precisely. These inaccuracies may cause the state computed by the model to diverge unacceptably from the actual state over time. To avoid this, we can make measurements of the state after each time step. If these measurements were exact, there would of course be no need to model the system dynamics. However, in general, the measurements themselves are imprecise.

Kalman filtering was invented to solve the problem of state estimation in such systems. Figure 6b shows the dataflow of the computation, and we use it to introduce standard terminology. An estimate of the initial state, denoted by cacm6211_q.gif , is assumed to be available. At each time step t=1, 2, .., the system model is used to provide an estimate of the state at time t using information from time t–1. This step is called prediction and the estimate that it provides is called the a priori estimate and denoted by cacm6211_r.gif . The a priori estimate is then fused with zt, the state estimate obtained from the measurement at time t, and the result is the a posteriori state estimate at time t, denoted by cacm6211_s.gif . This a posteriori estimate is used by the model to produce the a priori estimate for the next time step and so on. As described here, the a priori and a posteriori estimates are the means of certain random variables; the covariance matrices of these random variables are shown within parentheses each estimate in Figure 6b, and these are used to weight estimates when fusing them.

We first present the state evolution model and a priori state estimation. Then we discuss how state estimates are fused if an estimate of the entire state can be obtained by measurement. Finally, we discuss how to address this problem when only a portion of the state can be measured directly.

State evolution model and prediction. The evolution of the state over time is described by a series of random variables x0, x1, x2,…

  • The random variable x0 captures the likelihood of different initial states.
  • The random variables at successive time steps are related by the following linear model:

eq26.gif

Here, ut is the control input, which is assumed to be deterministic, and wt is a zero-mean noise term that models all the uncertainty in the system. The covariance matrix of wt is denoted by Qt, and the noise terms in different time steps are assumed to be uncorrelated to each other (such as, E[wiwj]=0 if ij) and to x0.

For estimation, we have a random variable x0|0 that captures our belief about the likelihood of different states at time t=0, and two random variables xt|t–1 and xt|t at each time step t = 1, 2, … that capture our beliefs about the likelihood of different states at time t before and after fusion with the measurement, respectively. The mean and covariance matrix of a random variable xi|j are denoted by cacm6211_t.gif . and Σi|j, respectively. We assume cacm6211_u.gif (no bias).

Prediction essentially uses xt–1|t–1 as a proxy for xt–1 in Equation 32 to determine xt|t–1 as shown in Equation 33.

eq27.gif

For state estimation, we need only the mean and covariance matrix of xt|t–1. The predictor box in Figure 6 computes these values; the covariance matrix is obtained from Lemma 2 under the assumption that wt is uncorrelated with xt–1|t–1, which is justified here.

Fusing complete observations of the state. If the entire state can be measured at each time step, the imprecise measurement at time t is modeled as follows:

eq28.gif

where vt is a zero-mean noise term with covariance matrix Rt. The noise terms in different time steps are assumed to be uncorrelated with each other (such as, E[vivj] is zero if ij) as well as with x0|0 and all wk. A subtle point here is that xt in this equation is the actual state of the system at time t (that is, a particular realization of the random variable xt), so variability in zt comes only from vt and its covariance matrix Rt.

Therefore, we have two imprecise estimates for the state at each time step t = 1, 2, …, the a priori estimate from the predictor cacm6211_v.gif and the one from the measurement (zt). If zt is uncorrelated to xt|t–1, we can use Equations 20–22 to fuse the estimates as shown in Figure 6c.

The assumptions that (i) xt–1|t–1 is uncorrelated with wt, which is used in prediction, and (ii) xt|t–1 is uncorrelated with zt, which is used in fusion, are easily proved to be correct by induction on t, using Lemma 2(ii). Figure 6b gives the intuition: xt|t–1 for example is an affine function of the random variables x0|0, w1, v1, w2, v2, …, wt, and is therefore uncorrelated with vt (by assumption about vt and Lemma 2(ii)) and hence with zt.

Figure 7 shows the computation pictorially using confidence ellipses to illustrate uncertainty. The dotted arrows at the bottom of the figure show the evolution of the state, and the solid arrows show the computation of the a priori estimates and their fusion with measurements.

f7.jpg
Figure 7. Illustration of Kalman filtering.

Fusing partial observations of the state. In some problems, only a portion of the state can be measured directly. The observable portion of the state is specified formally using a full row-rank matrix Ht called the observation matrix: if the state is x, what is observable is Htx. For example, if the state vector has two components and only the first component is observable, Ht can be [1 0]. In general, the Ht matrix can specify a linear relationship between the state and the observation, and it can be time-dependent. The imprecise measurement model introduced in Equation 34 becomes:

eq29.gif

The hidden portion of the state can be specified using a matrix Ct, which is an orthogonal complement of Ht. For example, if Ht = [1 0], one choice for Ct is [0 1].

Figure 6d shows the computation for this case. The fusion phase can be understood intuitively in terms of the following steps.

  1. The observable part of the a priori estimate of the state cacm6211_w.gif is fused with the measurement (zt), using Equations 20–22. The quantity cacm6211_x.gif is called the innovation. The result is the a posteriori estimate of the observable state cacm6211_y.gif .
  2. The BLUE of Theorem 4 is used to obtain the a posteriori estimate of the hidden state cacm6211_z.gif by adding to the a priori estimate of the hidden state cacm6211_aa.gif a value obtained from the product of the covariance between Htxt|t–1 and Cixt|t–1 and the difference between cacm6211_ab.gif and cacm6211_ac.gif
  3. The a posteriori estimates of the observable and hidden portions of the state are composed to produce the a posteriori estimate of the entire state cacm6211_ad.gif .

The actual implementation produces the final result directly without going through these steps as shown in Figure 6d, but these incremental steps are useful for understanding how all this works, and their implementation is shown in more detail in Figure 8.

f8.jpg
Figure 8. Computation of a posteriori estimate.

Figure 6d puts all this together. In the literature, this dataflow is referred to as Kalman filtering. Unlike in Equations 18 and 21, the Kalman gain is not a dimensionless value here. If Ht = I, the computations in Figure 6d reduce to those of Figure 6c as expected.

Equation 39 shows that the a posteriori state estimate is a linear combination of the a priori state estimate cacm6211_ad.gif and the measurement (zt). The optimality of this linear unbiased estimator is shown in the Appendix. It was shown earlier that incremental fusion of scalar estimates is optimal. The dataflow of Figures 6(c,d) computes the a posteriori state estimate at time t by incrementally fusing measurements from the previous time steps, and this incremental fusion can be shown to be optimal using a similar argument.

Example: falling body. To demonstrate the effectiveness of the Kalman filter, we consider an example in which an object falls from the origin at time t–0 with an initial speed of 0 m/s and an expected constant acceleration of 9.8 m/s2 due to gravity. Note that acceleration in reality may not be constant due to factors such as wind, and air friction.


We have shown that Kalman filtering for state estimation in linear systems can be derived from two elementary ideas: optimal linear estimators for fusing uncorrelated estimates and best linear unbiased estimators for correlated variables.


The state vector of the object contains two components, one for the distance from the origin s(t) and one for the velocity v(t). We assume that only the velocity state can be measured at each time step. If time is discretized in steps of 0.25 seconds, the difference equation for the dynamics of the system is easily shown to be the following:

ueq05.gif

where we assume cacm6211_ae.gif and cacm6211_af.gif .

The gray lines in Figure 9 show the evolution of velocity and distance with time according to this model. Because of uncertainty in modeling the system dynamics, the actual evolution of the velocity and position will be different in practice. The red lines in Figure 9 show one trajectory for this evolution, corresponding to a Gaussian noise term with covariance cacm6211_ag.gif in Equation 32 (because this noise term is random, there are many trajectories for the evolution, and we are just showing one of them). The red lines correspond to “ground truth” in our example.

f9.jpg
Figure 9. Estimates of the object’s state over time.

The green points in Figure 9b show the noisy measurements of velocity at different time steps, assuming the noise is modeled by a Gaussian with variance 8. The blue lines show the a posteriori estimates of the velocity and position. It can be seen that the a posteriori estimates track the ground truth quite well even when the ideal system model (the gray lines) is inaccurate and the measurements are noisy. The cyan bars in the right figure show the variance of the velocity at different time steps. Although the initial variance is quite large, application of Kalman filtering is able to reduce it rapidly in few time steps.

Discussion. We have shown that Kalman filtering for state estimation in linear systems can be derived from two elementary ideas: optimal linear estimators for fusing uncorrelated estimates and best linear unbiased estimators for correlated variables. This is a different approach to the subject than the standard presentations in the literature. One standard approach is to use Bayesian inference. The other approach is to assume that the a posteriori state estimator is a linear combination of the form cacm6211_ah.gif , and then find the values of At and Bt that produce an unbiased estimator with minimum MSE. We believe that the advantage of the presentation given here is that it exposes the concepts and assumptions that underlie Kalman filtering.

Most presentations in the literature also begin by assuming that the noise terms wt in the state evolution equation and vt in the measurement are Gaussian. Although some presentations1,10 use properties of Gaussians to derive the results in Figure 3, these results do not depend on distributions being Gaussians. Gaussians however enter the picture in a deeper way if one considers nonlinear estimators. It can be shown that if the noise terms are not Gaussian, there may be nonlinear estimators whose MSE is lower than that of the linear estimator presented in Figure 6d. However, if the noise is Gaussian, this linear estimator is as good as any unbiased nonlinear estimator (that is, the linear estimator is a minimum variance unbiased estimator (MVUE)). This result is proved using the Cramer-Rao lower bound.24

Back to Top

Extension to Nonlinear Systems

The extended Kalman filter (EKF) and unscented Kalman filter (UKF) are heuristic approaches to using Kalman filtering for nonlinear systems. The state evolution and measurement equations for nonlinear systems with additive noise can be written as follows; in these equations, f and h are nonlinear functions.

eq30.gif

eq31.gif

Intuitively, the EKF constructs linear approximations to the nonlinear functions f and h and applies the Kalman filter equations, whereas the UKF constructs approximations to probability distributions and propagates these through the nonlinear functions to construct approximations to the posterior distributions.

EKF. Examining Figure 6d, we see that the a priori state estimate in the predictor can be computed using the system model: cacm6211_ai.gif . However, as the system dynamics and measurement equations are nonlinear, it is not clear how to compute the co-variance matrices for the a priori estimate and the measurement. In the EKF, these matrices are computed by linearizing Equations 42 and 43 using the Taylor series expansions for the nonlinear functions f and h. This requires computing the following Jacobianse, which play the role of Ft and Ht in Figure 6d.

ueq06.gif

The EKF performs well in some applications such as navigation systems and GPS.28

UKF. When the system dynamics and observation models are highly nonlinear, the unscented Kalman filter (UKF)15 can be an improvement over the EKF. The UKF is based on the unscented transformation, which is a method for computing the statistics of a random variable x that undergoes a nonlinear transformation (y = g(x)). The random variable x is sampled using a carefully chosen set of sigma points and these sample points are propagated through the nonlinear function g. The statistics of y are estimated using a weighted sample mean and covariance of the posterior sigma points. The UKF tends to be more robust and accurate than the EKF but has higher computation overhead due to the sampling process.

Back to Top

Conclusion

In this article, we have shown that two concepts—optimal linear estimators for fusing uncorrelated estimates and best linear unbiased estimators for correlated variables—provide the underpinnings for Kalman filtering. By combining these ideas, standard results on Kalman filtering for linear systems can be derived in an intuitive and straightforward way that is simpler than other presentations of this material in the literature. This approach makes clear the assumptions that underlie the optimality results associated with Kalman filtering and should make it easier to apply Kalman filtering to problems in computer systems.

Back to Top

Acknowledgments

This research was supported by NSF grants 1337281, 1406355, and 1618425, and by DARPA contracts FA8750-16-2-0004 and FA8650-15-C-7563. We are indebted to K. Mani Chandy (Caltech), Ivo Babuska (UT Austin,) and Augusto Ferrante (Padova, Italy) for their valuable feedback.

Back to Top

Back to Top

Back to Top

    1. Babb, T. How a Kalman filter works, in pictures | bzarg. 2018. https://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/. Accessed: 2018-11-30

    2. Balakrishnan, A.V. Kalman Filtering Theory. Optimization Software, Inc., Los Angeles, CA, USA, 1987.

    3. Barker, A.L., Brown, D.E., Martin, W.N. Bayesian Estimation and the Kalman Filter. Technical Report. Charlottesville, VA, USA, 1994.

    4. Becker, A. Kalman filter overview. https://www.kalmanfilter.net/default.aspx. 2018. Accessed: 2018-11-08.

    5. Bergman, K. Nanophotonic interconnection networks in multicore embedded computing. In 2009 IEEE/LEOS Winter Topicals Meeting Series (2009), 6–7.

    6. Cao, L., Schwartz, H.M. Analysis of the Kalman filter based estimation algorithm: An orthogonal decomposition approach. Automatica 1, 40 (2004), 5–19.

    7. Chui, C.K., Chen, G. Kalman Filtering: With Real-Time Applications, 5th edn. Springer Publishing Company, Incorporated, 2017.

    8. Eubank, R.L. A Kalman Filter Primer (Statistics: Textbooks and Monographs). Chapman & Hall/CRC, 2005.

    9. Evensen, G. Data Assimilation: The Ensemble Kalman Filter. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

    10. Faragher, R. Understanding the basis of the Kalman filter via a simple and intuitive derivation. IEEE Signal Process. Mag. 5, 29 (2012), 128–132.

    11. Grewal, M.S., Andrews, A.P. Kalman Filtering: Theory and Practice with MATLAB, 4th edn. Wiley-IEEE Press, 2014.

    12. Hess, A.-K., Rantzer, A. Distributed Kalman filter algorithms for self-localization of mobile devices. In Proceedings of the 13th ACM International Conference on Hybrid Systems: Computation and Control, HSCC '10, 2010, 191–200.

    13. Imes, C., Kim, D.H.K., Maggio, M., Hoffmann, H. POET: A portable approach to minimizing energy under soft real-time constraints. In 21st IEEE Real-Time and Embedded Technology and Applications Symposium, April 2015, 75–86.

    14. Imes, C., Hoffmann, H. Bard: A unified framework for managing soft timing and power constraints. In International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), 2016.

    15. Julier, S.J., Uhlmann, J.K. Unscented filtering and nonlinear estimation. Proc. IEEE 3, 92 (2004), 401–422.

    16. Kitanidis, P.K. Unbiased minimum-variance linear state estimation. Automatica 6, 23 (1987), 775–778.

    17. Lindquist, A., Picci, G. Linear Stochastic Systems. Springer-Verlag, 2017.

    18. Maybeck, P.S. Stochastic Models, Estimation, and Control, volume 3. Academic Press, 1982.

    19. Mendel, J.M. Lessons in Estimation Theory for Signal Processing, Communications, and Control. Pearson Education, 1995.

    20. Nagarajan, K., Gans, N., Jafari, R. Modeling human gait using a Kalman filter to measure walking distance. In Proceedings of the 2nd Conference on Wireless Health, WH '11 (New York, NY, USA, 2011). ACM, 34:1–34:2.

    21. Nakamura, E.F., Loureiro, A.A.F., Frery, A.C. Information fusion for wireless sensor networks: methods, models, and classifications. ACM Comput. Surv. 3, 39 (2007).

    22. Petersen, K.B., Pedersen, M.S. The Matrix Cookbook. http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=3274. November 2012. Version 20121115.

    23. Pothukuchi, R.P., Ansari, A., Voulgaris, P., Torrellas, J. Using multiple input, multiple output formal control to maximize resource efficiency in architectures. In Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (2016), IEEE, 658–670.

    24. Rao, C.R. Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc., 37 (1945), 81–89.

    25. Rhudy, M.B., Salguero, R.A., Holappa, K. A Kalman filtering tutorial for undergraduate students. Int. J. Comp. Sci. Eng. Surv. (1), 8 (2017).

    26. Sengupta, S.K. Fundamentals of statistical signal processing: Estimation theory. Technometrics (4), 37 (1995), 465–466.

    27. Souza, É.L., Nakamura, E.F., Pazzi, R.W. Target tracking for sensor networks: A survey. ACM Comput. Surv. (2), 49 (2016), 30:1–30:31.

    28. Thrun, S., Burgard, W., Fox, D. Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). The MIT Press, 2005.

    29. Welch, G., Bishop, G. An Introduction to the Kalman Filter. Technical Report. Chapel Hill, NC, USA, 1995.

    30. Pei, Y., Biswas, S., Fussell, D.S., Pingali, K. An Elementary Introduction to Kalman Filtering. ArXiv e-prints. October 2017.

    a. An extended version of this article that includes additional background material and proofs is available.30

    b. Basic concepts such as probability density function, mean, expectation, variance and covariance are introduced in the online appendix.

    c. The role of Gaussians in Kalman filtering is discussed later in the article.

    d. We thank Mani Chandy for showing us this approach to proving the result.

    e. The Jacobian matrix is the matrix of all first order partial derivatives of a vector-valued function.

    The online appendix for this article can be found at http://dl.acm.org/citation.cfm?doid=3363294&picked=formats

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More