Hunor Jakab: Guided exploration in policy gradient algorithms with Gaussian process function approximation.