The aim of this project was to train a model that controls an agent (a car) within a simulator. The model gets images from a camera in the vehicle from the driver’s perspective. Given only this input the agent has to output a steering angle that the car then uses to steer.


To solve this problem I planned to use a transfer learning approach. The basis for my model was an ImageNet-pre-trained Inception v3 architecture. This model is trained for a classification task (with cross-entropy loss). (In the end I didn’t use the pre-trained weights, but rather trained the model from scratch). Since the steering angle prediction is a good candidate for a regression task, I had to modify the architecture’s head. So instead of outputting pseudo probabilities for 1,000 classes, my modified architecture uses just a fully connected layer with one neuron as head. As cost function I use the standard mean squared error:


The training data was generated by driving the car manually around the track and capturing images and steering angles. To improve the generalization ability of the model I drove several rounds in each direction and made especially many recordings where I intentionally corrected the car’s mistakes, like driving from the edge of the track back towards the road’s center. Based on these recordings I’ve built a training set and trained the model with the Adam optimizer.


The final result can be watched here, but only in low resolution (video has the same resolution as the training images).


View on Github