In this project I implemented an Extended Kalman Filter (EKF) in C++. The EKF is used to fuse lidar and radar measurements to track an evil bicyclist who’s constantly driving around our vehicle (who does that?). A video of how that looks can be found here (not very spectacular).


The idea of a Kalman Filter is to improve the measurement process, which always contains at least some amount of noise or uncertainties. This is done by taking knowledge about the process and specifically uncertainty into account. To do this we formulate model equations, like kinematic equations, which postulate how we expect the system to behave. Based on this knowledge we can then make a statement about how likely it is, that our bicyclist suddenly started his jet engine and moved 50 meters in one second. In this case it may be smarter to rely on our model and consider this measurement as an error and trust our model more for our estimation of the true position.
If we relied solely on the noisy measurements of the position and speed of our bicyclist and tried to visualize it, we would see a graph that would probably be very erratic and couldn’t describe the actual trajectory. So what we’re basically trying, is to smoothen the erratic measurements.

The KF regards the whole task as a probabilistic process and combines measurements with Bayesian inference. It iterates over the two process steps prediction, which is the a priorio step, and update, the posteriori step.

Source: Wikipedia on Kalman Filter

The state vector x contains the variables we want to track, in this case the x,y position as well as the velocity in the x and y direction of the bicyclist.

Notation: Instead of explicitly denoting every vector, lower case and bold letters are vectors and upper case letters are matrices

Kalman Filters are modelled on a Markov chain which, in short, means that the state at time k depends only on the state and actions from the previous state at time k-1. The state transition is described as follows (all following equations are from Wikipedia):


  • x_k is the state vector at time k,
  • F_k the state transition matrix,
  • B_k the control input model,
  • u_k the control vector,
  • and w_k is the process noise, drawn from a zero mean multivariate Gaussian .

The dot product of the state transition matrix and the state vector delivers the equations for each state vector element and places the different state variables in relationship to each other, if there is any. The dot product of the control input model and the control vector does the same for the control variables. The control vector describes the actions, for example if a car accelerates or steers. The control matrix would then ensure that the control inputs are mapped into the state equation space, by setting the control variables in relationship to the state variables. But since we’re unable to observe the controls of our bicyclist, we won’t be able to consider the control variables. The process noise isn’t actually calculated but rather assumed to be inherent to the state.


To perform the a priori prediction, we predict the next state solely on our theoretical model, without taking measurements into account, hence the a priori.

Besides predicting the state, we’re also predicting the uncertainty in this state by estimating its covariance.

Q_k is the process covariance matrix which accounts for uncertainties in the process. Contributing factors might be inaccuracies in the process model like the negligence of external influences e.g. wind resistent.


In the next step we compare our prediction and the associated uncertainty with the actual measurement and its noise. Based on how certain or likely each estimation of our state seems, we decide how much to trust and how much influence we allow the prediction and measurement to have on our final estimation for that timestep. This weighing up is expressed by the Kalman gain.

The first update equation compares the measurement vector z_k with our state prediction. We often may only be able to measure our state variables indirectly, rather than directly. This is where our observation matrix H comes in. It maps our state variables from state space to measurement space so that we can compare both. This difference is also called Innovation.

The next two equations describe the Innovation’s covariance S and the Kalman Gain K. R is the measurement covariance matrix which describes how accurate our sensors are. The Kalman Gain can be tought of as a factor that defines how much to weigh the prediction vs. the measurement (although often it will be a matrix rathen than a scalar).

Finally we update our state with our new estimation, where we use the (dot)product of the Kalman Gain and Innovation. The covariance is also updated.

The last equation is not really necessary, this is only the new error between our final estimation and the measurement to determine the post-fit residual.

So that’s it. Easy peasy, right? (It’s a trap). Not quite.


The equations above are only valid if the transformations, like the mapping from state to measurement space in

are linear transformations.

This is true for lidar measurements where we measure the x and y distance directly and H is just a matrix. However this isn’t the case for radar measurements.

Dealing with non-linearity - Extended Kalman Filter

This is what a radar measures:

Source: Udacity course lectures

In order to relate this to our state variables we’re no longer able to use a simple dot product or any other linear transformation. Instead of y = z - Hx' we have to use y = z - h(x'), where x' is the predicted state and h(x') is given by the following equation:

“Bridge, we have a new problem…” Geordi La Forge


So what’s the big deal? The Kalman Filter is based on the probabilistic assumption that the state and measurement variables implicitly contain noise which is modelled by a Gaussian with zero mean (remember the process noise w_k above?). This statement is equivalent to saying that each state vector value is acutally just the mean of a normal distribution with the same standard deviation. And this means trouble. If we now put our state (which is the mean of a Gaussian) through a non-linear mapping h(x'), the distribution and its statistics have changed. The picture below shows this for a non-linear transformation with h(x) = atan(x).

Source: Udacity course lectures

To avoid this warp core breach, we use a linear approximation around the mean. For a single variable this could be done with a Taylor Series Expansion like this:

or in short:

But since we just want a linearization, we restrict this to a first order series with n from 0 to 1. The last thing we have to change is to generalize the linear approximation to higher dimensions, because h(x) is a function with multiple input and output dimensions (multivariate (input) & vector-valued function (output)). The first order partial derivatives of such a vector-valued function can be put in a matrix of the following form:

Source: Wikipedia on Jacobian matrix

This is then called the Jacobian matrix. To me this looks suspiciously like some stacked gradients of a multivariable function that is also vector-valued, but no guarantees on that. For the specific case of a radar measurement the Jacobian is defined as

This is nothing else than deriving every output variable with respect to every input variable. Unlike the observation matrix in the linear case, this matrix needs to be calculated at every timestep. The Jacobian for this specific h(x) is

And that’s finally it. Warp core breach prevented and we can call it a day. By adapting the equation for the Innovation with y = z - h(x') and replacing the observation matrix H with the Jacobian H_j when calculating the Innovation’s covariance, we’ve built an Extended Kalman Filter. All the filter now has to do, is to react accordingly to the measurement type (lidar needs no linearization) and our sensor data is fused.



View on Github