Mathieu Carrière, currently PostDoc at Columbia University, will give a talk next Monday 24th at 2:30pm in Gilles Kahn, on Statistics and Machine Learning in Topological Data Analysis with Applications to Biology.
Abstract: Topological Data Analysis (TDA) is an area of data science which aims at characterizing data sets with their topological features in various dimensions. Examples of such features include the connected components, the loops or the cavities that are present in the data, and which are encoded in the two main descriptors of TDA, the so-called persistence diagram and Mapper. Even though these descriptors have proved useful in many applications, it is not straightforward to include them in automated processes, which are common in statistics and machine learning, mainly because the space of these descriptors lacks a lot of required properties, such as a well-defined addition or barycenter. In this talk, I will recall the basics of TDA and review the current solutions that have been proposed in the past few years to merge TDA descriptors (persistence diagram and Mapper) with statistics and Machine Learning. Then, I will introduce some questions that remain open in this topic, and that are active fields of research as of today, such that the question of persistence diagram differentiability for deep learning, or the statistical analysis of the Mapper in the multivariate case. In the process, I will also illustrate these problems by providing applications on biological data, such as immunofluorescence images for breast cancer pathology and single-cell RNA sequencing for understanding the spinal cord cellular diversity.