For historical reasons, we are using the RDD API of Spark. The idea would be to update everything with the DataFrame API.