Task
In this project I implemented an Extended Kalman Filter (EKF) in C++. The EKF is used to fuse lidar and radar measurements to track an evil bicyclist who’s constantly driving around our vehicle (who does that?). A video of how that looks can be found here (not very spectacular).
Algorithm
The idea of a Kalman Filter is to improve the measurement process,
which always
contains at least some amount of noise or uncertainties. This is done by taking knowledge about the process and specifically uncertainty
into account. To do this we formulate model equations, like kinematic equations, which postulate how we expect the system
to behave. Based on this knowledge we can then make a statement about how likely it is, that our bicyclist suddenly started his
jet engine and moved 50 meters in one second. In this case it may be smarter to rely on our model and consider this measurement
as an error and trust our model more for our estimation of the true position.
If we relied solely on the noisy measurements of the position and speed of our bicyclist and tried to visualize it, we would see a graph
that would probably be very erratic and couldn’t describe the actual trajectory. So what we’re basically trying, is to smoothen
the erratic measurements.
The KF regards the whole task as a probabilistic process and combines measurements with Bayesian inference. It iterates over the two process steps prediction, which is the a priorio step, and update, the posteriori step.
_{Source: Wikipedia on Kalman Filter}
The state vector x
contains the variables we want to track, in this case the x,y position as well as the velocity in
the x and y direction of the bicyclist.
Notation: Instead of explicitly denoting every vector, lower case and bold letters are vectors and upper case letters are matrices
Kalman Filters are modelled on a Markov chain which, in short, means that the state
at time k
depends only on the state and actions from the previous state at time k1
. The state transition is described
as follows (all following equations are from Wikipedia):
where

x_k
is the state vector at timek
, 
F_k
the state transition matrix, 
B_k
the control input model, 
u_k
the control vector,  and
w_k
is the process noise, drawn from a zero mean multivariate Gaussian .
The dot product of the state transition matrix and the state vector delivers the equations for each state vector element and places the different state variables in relationship to each other, if there is any. The dot product of the control input model and the control vector does the same for the control variables. The control vector describes the actions, for example if a car accelerates or steers. The control matrix would then ensure that the control inputs are mapped into the state equation space, by setting the control variables in relationship to the state variables. But since we’re unable to observe the controls of our bicyclist, we won’t be able to consider the control variables. The process noise isn’t actually calculated but rather assumed to be inherent to the state.
Prediction
To perform the a priori prediction, we predict the next state solely on our theoretical model, without taking measurements into account, hence the a priori.
Besides predicting the state, we’re also predicting the uncertainty in this state by estimating its covariance.
Q_k
is the process covariance matrix which accounts for uncertainties in the process. Contributing factors might be
inaccuracies in the process model like the negligence of external influences e.g. wind resistent.
Update
In the next step we compare our prediction and the associated uncertainty with the actual measurement and its noise. Based on how certain or likely each estimation of our state seems, we decide how much to trust and how much influence we allow the prediction and measurement to have on our final estimation for that timestep. This weighing up is expressed by the Kalman gain.
The first update equation compares the measurement vector z_k
with our state prediction. We often may only be able to
measure our state variables indirectly, rather than directly. This is where our observation
matrix H
comes in. It maps our state variables from state space to measurement space so that we can compare both.
This difference is also called Innovation.
The next two equations describe the Innovation’s covariance S
and the Kalman Gain K
.
R
is the measurement covariance matrix which describes how accurate our sensors are. The Kalman Gain can be tought of
as a factor that defines how much to weigh the prediction vs. the measurement (although often it will be a matrix
rathen than a scalar).
Finally we update our state with our new estimation, where we use the (dot)product of the Kalman Gain and Innovation. The
covariance is also updated.
The last equation is not really necessary, this is only the new error between our final estimation and the measurement
to determine the postfit residual.
So that’s it. Easy peasy, right? (It’s a trap). Not quite.
Notice:
The equations above are only valid if the transformations, like the mapping from state to measurement space in
are linear transformations.
This is true for lidar measurements where we measure the x and y distance directly and H
is just a
matrix. However this isn’t the case for radar measurements.
Dealing with nonlinearity  Extended Kalman Filter
This is what a radar measures:
_{Source: Udacity course lectures}
In order to relate this to our state variables we’re no longer able to use a simple dot product or any other linear
transformation. Instead of y = z  Hx'
we have to use y = z  h(x')
, where x'
is the predicted state and h(x')
is given by the following equation:
“Bridge, we have a new problem…” Geordi La Forge
So what’s the big deal? The Kalman Filter is based on the probabilistic assumption that the state and measurement variables
implicitly contain noise which is modelled by a Gaussian with zero mean (remember the process noise w_k
above?). This
statement is equivalent to saying that each state vector value is acutally just the mean of a normal distribution with
the same standard deviation. And this means trouble. If we now put our state (which is the mean of a Gaussian) through a
nonlinear mapping h(x')
, the distribution and its statistics have changed.
The picture below shows this for a nonlinear transformation with
h(x) = atan(x)
.
_{Source: Udacity course lectures}
To avoid this warp core breach, we use a linear approximation around the mean. For a single variable this could be done with a Taylor Series Expansion like this:
or in short:
But since we just want a linearization, we restrict this to a first order series with n
from 0 to 1.
The last thing we have to change is to generalize the linear approximation to higher dimensions, because h(x)
is a function
with multiple input and output dimensions (multivariate (input) & vectorvalued function (output)).
The first order partial derivatives of such a vectorvalued function can be put
in a matrix of the following form:
_{Source: Wikipedia on Jacobian matrix}
This is then called the Jacobian matrix. To me this looks suspiciously like some stacked gradients of a multivariable function that is also vectorvalued, but no guarantees on that. For the specific case of a radar measurement the Jacobian is defined as
This is nothing else than deriving every output variable with respect to every input variable.
Unlike the observation matrix in the linear case, this matrix needs to be calculated at every timestep.
The Jacobian for this specific h(x)
is
And that’s finally it. Warp core breach prevented and we can call it a day. By adapting the equation for the Innovation
with y = z  h(x')
and replacing the observation matrix H
with the Jacobian H_j
when calculating the Innovation’s
covariance, we’ve built an Extended Kalman Filter. All the filter now has to do, is to react accordingly to the measurement
type (lidar needs no linearization) and our sensor data is fused.