Abstract:

Contrary to classic classification and regression trees, built in a greedy heuristic manner, designing the tree model through an optimization problem allows us to easily include desirable properties in Machine Learning in addition to prediction accuracy. In this talk, we present a Continuous Optimization approach that is scalable with respect to the size of the training sample, and illustrate this flexibility to model several important issues in Explainable and Fair Machine Learning. These include sparsity, as a proxy for interpretability, by reducing the amount of information necessary to predict well; fairness, by aiming to avoid predictions that discriminate against sensitive features such as gender or race; the cost-sensitivity for groups of individuals in which prediction errors are more critical, such as patients of a disease, by ensuring an acceptable accuracy performance for them; local explainability, where the goal is to identify the predictor variables that have the largest impact on the individual predictions; as well as data complexity in the form of observations of functional nature. The performance of our approach is illustrated on real and synthetic data sets.