Computational Statistics and Optimization

Syllabus

The aim of this lecture is to describe the necessary tools to solve the optimization problems associated to the training step of machine learning models. It gives the basic mathematical tools from convex optimization, and describes several approaches allowing to construct efficient convex optimization algorithms : first and second order batch algorithms, stochastic gradient based algorithms and distributed algorithms. Several practical courses on machine will illustrate the use of these algorithms for model training on datasets.

Agenda

Part 1 [21h cours + 3h TP]

Introduction : basic models, basic convex analysis : KKT, subdifferential, Fenchel Conjuguate, Lipschitz gradient, duality gap, strong convexity [9h course]
First order batch methods : gradient (gradient, conjugate gradient, coordinate descent, line search methods, Nesterov’s acceleration) [6h course]
Proximal methods : why penalization / overfitting ? examples of operators, the FISTA algorithm. [6h course] + [3h TP]
First exam : [2h exam] + [homework]

Part 2 [21h cours + 3h TP]

Numerical tricks : warm starting, lazy updating. Whether solving the primal or the dual depending on the problem size : the logistic regression example [3h course]
Second order batch methods : Newton, LBGFS-B [3h course]
Stochastic Algorithms : Stochastic Gradient Descent and friends [6h course + 3h TP]
Primal-Dual Algorithms : complements on duality, ADMM, distributed optimization [9h course]
Final exam : [3h exam] + [project]

Prerequisites

Resolution of linear systems
Constrained optimization

Bibliography

Books

Boyd and Vandenberghe, Convex optimization, Cambridge university press (2009)
Bauschke and Combettes, Convex analysis and monotone operator theory in Hilbert spaces, Springer (2011)
Hiriart-Urruty and Lemaréchal, Convex Analysis and Minimization Algorithms, Springer (1996)
Borwein and Lewis, Convex analysis and nonlinear optimization: theory and examples, Springer (2010)

Survey papers

Boyd et al., Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Fondation and Trends in Machine Learning (2010)
Boyd and Parikh, Proximal Algorithms, Fondation and Trends in Optimization (2013)
Bach et al, Optimization with Sparsity-Inducing Penalties, Fondation and Trends in Machine Learning (2011)

Data Sciences in the Mathematics with Applications track

(former - Mathbigdata)

Computational Statistics and Optimization [English, deprecated]