We are pleased to welcome Bradley WORLEY (Institut Pasteur) next Thursday 22/02, 14h30, at LIX (building Alan Turing, 1 Rue Honoré d'Estienne d'Orves, 91120 Palaiseau) room Philippe Flajolet.
Abstract: Sparse recovery is a cornerstone of nonuniform sampling (NUS) methods in nuclear magnetic resonance (NMR), magnetic resonance imaging (MRI), and other instrumental techniques. While most recovery algorithms can find globally optimal solutions by means of convex optimization, they generally do not provide error estimates, which are critically important in scientific applications. We introduce the variational method of constructing approximate Bayesian inference algorithms, and show recent results of its use in NUS NMR reconstruction. Connections with existing compressed sensing algorithms and current efforts towards improved performance will also be presented.
Résumé: Étant donné un groupe de permutation infini G, on définit la fonction qui à tout entier naturel n associe le nombre d'orbites de sous-ensembles de cardinal n, pour l'action induite de G sur les sous-ensembles d'éléments. Cameron a conjecturé que cette fonction de comptage (le profil de G) est un quasi-polynôme dans le cas où sa croissance est polynomiale. Une autre conjecture, plus forte, a été émise plus tard par un de ses étudiants, Macpherson. Elle concerne une certaine structure d'algèbre graduée sur les orbites de sous-ensembles, créée par Cameron, et suppose que si le profil de G est polynomial, alors son algèbre des orbites est de type fini. L'exposé présentera ces deux conjectures et leur contexte, ainsi que les idées de la preuve de la conjecture de Macpherson, fruit d'un travail commun de Nicolas Thiéry et Justine Falque (avec les conseils précieux de Peter Cameron lui-même).
The next Comète seminar will take place on Monday 12th of February 2017 at 14h00 in Salle Philippe Flajolet (LIX - Alan Turing building, École Polytechnique). Prakash Panangaden, Professor at McGill University School of Computer Science, will talk about his joint work with Radu Mardare and Gordon Plotkin: Quantitative Equational Logic.
Abstract: Equational reasoning is a central part of mathematics. The traditional theory links the equational logic and monads on the category of sets. More precisely, one can define an algebraic theory by giving a set of operations and equations. One can then show that the collection of algebras defined by these equations form the algebras for a monad on the category of sets. One has classical theorems like the completeness and variety theorems of Birkhoff. We consider a modified notion of equation where the equality symbol is annotated with a real number so we can write s =_e t with the interpretation that the terms "s" and "t" are within "e" of each other. We develop the metatheory and obtain some analogues of the classical theorems. Furthermore, we show that this extended notion of equational definition yields algebras of monads on the category of metric spaces. What makes the theory interesting is the fact that well known metric spaces of probability distributions can be defined by such equations.
Sonia Marin will defend her PhD thesis entitled « Modal proof theory through a focused telescope » on Tuesday, January 30th at 3pm in Amphithéâtre Sophie Germain (bâtiment Turing).
Abstract: In this thesis, we consider the proof theory of modal logic to illustrate two uses of the synthetic inference rules extracted from focused proof systems. From one side of the "telescope", focusing allows us to analyse the internal machinery of inference rules. We show first how to emulate Simpson's labelled sequent calculus for intuitionistic modal logic with Liang and Miller's focused sequent calculus for first-order logic, therefore extending the result of Miller and Volpe to the intuitionistic case. Then, we propose a similar encoding though for ordinary (unlabelled) sequent calculus via an intermediate focused framework based on Negri's labelled sequent calculus. From the other side of the "telescope", focusing allows us to observe more global behaviours in proofs. In particular, we give completeness proofs for two nested sequent calculi for both classical and intuitionistic modal logic, first for a focused version, and then for a system merely based on synthetic inference rules. These rules only retain the transitions between big steps of reasoning forgetting most of the focused rules, which renders the system presentation clear and elegant while also simplifying the cut-elimination and completeness proofs.
9h55 — 10h00 : présentation de la journée
10h00 — 11h : exposés d'équipes et discussions
- Yanlei Diao (équipe CEDAR du LIX & d'Inria Saclay)
- Marie-Paule Cani (équipe Stream du LIX)
- Bertrand Beaufils (équipe DataShape d'Inria Saclay)
- Yann Ponty (équipe Amibio du LIX)
11h — 12h : Jean-Gabriel Ganascia (Sorbonne Université, LIP6)
2010 : l’Odyssée de l’intelligence artificielle.
Après avoir rappelé les définitions historiques de l’intelligence artificielle, puis avoir évoqué les réussites de ces dernières années, nous brosserons un tableau des différentes problématiques abordées par l’intelligence artificielle en prenant pour fil conducteur les cinq grandes classes de fonctions cognitives que l’on essaie de simuler, à savoir les fonctions perceptives, la mémoire, la représentation des connaissance et l’apprentissage, le raisonnement, les fonctions expressives et enfin les fonctions exécutives. Nous terminerons en évoquant à la fois les déclarations excessives des grands acteurs de l'internet sur l’intelligence artificielle et, surtout, le caractère « tressé » du temps technologique qui fait que des approches tombées en désuétudes reviennent souvent sur le devant de la scène.
13h50 — 14h10 : Bertrand Braunschweig (Inria Saclay)
14h10 — 16h00 : exposés d'équipes et discussions
- Stéphane Graham-Lengrand (équipe Parsifal du LIX & d'Inria Saclay)
- Benjamin Doerr (équipe AlCo du LIX)
- Ulrich Fahrenberg (équipe Cosynus du LIX)
- Michel Fliess (équipe MAX du LIX)
- Catuscia Palamidessi (équipe Comete du LIX & d'Inria Saclay)
- Jesse Read (équipe Dascim du LIX)
Abstract: In this talk, I will present how I tackled, during my research projects, some of the activities encountered when designing a control system. Designing a control system implies a number of options, decisions and compromises which depend on the properties of the controlled system and on the required performances. To design a control system, the modelling, controller design, analysis, simulation, implementation and verification steps, which represent research directions by themselves, are often required.
I applied an alternative old approach of deriving optimal control laws such as inverse problems of variations calculus or inverse optimal control (IOC) which arouse a renewal of interest among researchers during the years. I addressed inverse optimal control and inverse optimization approaches based parametrized Lagrangians, which reduces the nonlinear parametric optimization problems to linear least square optimization, hence being easier to solve. I used IOC approach to analyse and propose simply but efficient bio-inspired models for redundant biological rhythmical motions, such as: arm motion, postural sway-and-balance, fast-and-normal human locomotion and squat movements. Moreover, I used this IOC approach as a tool to discriminate and reproduce the human fast-and-normal gait for patients with Parkinson’s disease (PD). By combining different research topics such as: estimation, least square methods and interval analysis, I propose a novel IOC method solved in a bounded-error framework was proposed.
Recently, I addressed the navigation planning level for the autonomous navigation of mobile robots and I proposed and validated on a easy to reproduce robotic platform, a new optimal (in terms of path length) and a robust motion planning algorithm based incremental sampling-based motion algorithm, i.e. Rapidly-exploring Random Tree (RRT) and its optimal versions based interval analysis tools.
In this talk I will describe a set of efficient computational methods for analyzing, quantifying and exploring relations and variability in structured data sets, such as collections of geometric shapes, point clouds, and large networks or graphs, among others. The main unifying theme of this work is the observation that many concepts in data analysis can be considered, both in theory and in practice, as linear operators acting on real-valued functions on the appropriate domains. Since real-valued functions can be defined on a wide variety of data representations and, as they enjoy a rich algebraic structure, such an approach has the potential to provide unified framework for representing and processing different types of data. I will give an overview of these and related techniques and demonstrate, in particular, how the functional operator point of view can be helpful in solving practical problems, including shape correspondence, tangent vector field processing and shape comparison among others.
Summary: Consider two data providers, each maintaining private records of different feature sets about common entities. They aim to learn a linear model jointly in a federated setting, namely, data is local and a shared model is trained from locally computed updates. In contrast with most work on distributed learning, in this scenario (i) data is split vertically, i.e. by features, (ii) only one data provider knows the target variable and (iii) entities are not linked across the data providers.
We describe a three-party end-to-end solution in two phases: privacy-preserving entity resolution and federated logistic regression over messages encrypted with an additively homomorphic scheme. It is secure against an honest-but-curious adversary. The system allows learning without either exposing data in the clear or sharing which entities the data providers have in common. Our implementation is as accurate as a naive non-private solution that unifies data in one place, and scales to problems with millions of entities with hundreds of features.
Luca Mencarelli will defend his PhD thesis entitled “The Multiplicative Weights Algorithm for Mixed Integer NonLinear Programming: Theory, Applications and Limits” on Monday December 4th 2017 at 2:00 PM in the Gilles Khan room (Bâtiment Alan Turing).
Abstract: In this thesis we introduce a new heuristic algorithm for Mixed Integer NonLinear Programming (MINLP) based on the Multiple Weights Update (MWU) framework and involving a new class of Mathematical Programming (MP) reformulations, namely the pointwise reformulations. The MWU algorithm is a “meta-algorithm” with broad application in Mathematical Optimization, Machine Learning, and Game Theory. MINLP is a very interesting topic in Mathematical Optimization both for a theoretical and computational viewpoint. Moreover, formulations and reformulations constitute a key tool in MP: pointwise reformulation are a family of MINLPs, where several “complicated” terms are substituted with parameters in order to obtain easier problems.
The thesis is divided into three main parts. In the first part we introduce the more important ingredients for the heuristic algorithm, in the second one we theoretically describe the new algorithm and in the last one we practically apply the new methodology to two hard real-world problems, namely the Mean-Variance Portfolio Selection (MVPS) and the Multiple NonLinear Knapsack Problem (MNLKP). The first problem arises in financial portfolio context, where investors want to identify the portfolio with the highest return and the minimum volatility or risk; while the second problem occurs, for example, in resource allocation, where a set of resources with limited capacities must be divided among different categories.
Computational experiments indicate the new methodology behaves rather better that the benchmarks for the MVPS problems, while this is not the case for the MNLKP. Therefore, we propose a different constructive heuristic based on three different phases: a procedure based on the discretisation of the solution space and two procedures for the feasibility recovering of the solution produced by the surrogate relaxation. The constructive heuristic outperforms the benchmarks both with respect of the quality of the solution found and of the total elapsed time.
Panagiotis Korvesis will defend hid PhD thesis entitled «Machine Learning for Predictive Maintenance in Aviation», on Tuesday, November 21st, in the Grace Hopper room of the Alan Turing building.
Abstract: The increase of available data in almost every domain raises the necessity of employing algorithms for automated data analysis. This necessity is highlighted in predictive maintenance, where the ultimate objective is to predict failures of hardware components by continuously observing their status, in order to plan maintenance actions well in advance. These observations are generated by monitoring systems usually in the form of time series and event logs and cover the lifespan of the corresponding components. Analyzing this history of observations in order to develop predictive models is the main challenge of data driven predictive maintenance.
Towards this direction, Machine Learning has become ubiquitous since it provides the means of extracting knowledge from a variety of data sources with the minimum human intervention. The goal of this dissertation is to study and address challenging problems in aviation related to predicting failures of components on-board. The amount of data related to the operation of aircraft is enormous and therefore, scalability is a key requirement in every proposed approach.
This dissertation is divided in three main parts that correspond to the different data sources that we encountered during our work. In the first part, we targeted the problem of predicting system failures, given the history of Post Flight Reports. We proposed a regression-based approach preceded by a meticulous formulation and data pre-processing/transformation. Our method approximates the risk of failure with a scalable solution, deployed in a cluster environment both in training and testing. To our knowledge, there is no available method for tackling this problem until the time this thesis was written.
The second part presents our approach for analyzing logbook data, which consist of text describing aircraft issues and the corresponding maintenance actions and it is written by maintenance engineers. The logbook contains information that is not reflected in the post-flight reports and it is very essential in several applications, including failure prediction. However, since the logbook contains text written in natural language, it contains a lot of noise that needs to be removed in order to extract useful information. We tackled this problem by proposing an approach based on vector representations of words (or word embeddings). Our approach exploits semantic similarities of words, learned by neural networks that generated the vector representations, in order to identify and correct spelling mistakes and abbreviations. Finally, important keywords are extracted using Part of Speech Tagging.
In the third part, we tackled the problem of assessing the health of components on-board using sensor measurements. In the cases under consideration, the condition of the component is assessed by the magnitude of the sensorâs fluctuation and a monotonically increasing trend. In our approach, we formulated a time series decomposition problem in order to separate the fluctuation from the trend by solving an optimisation problem. To quantify the condition of the component, we compute a risk function which measures the sensorâs deviation from its normal behavior, which is learned using Gaussian Mixture Models.
Huixing Meng soutiendra sa thèse intitulée Modélisation des patterns d'analyse des performances des systèmes de production et de sûreté de fonctionnement dans l'industrie des procédés, le vendredi 17 novembre 2017 à 9h30. La soutenance aura lieu dans l'amphithéâtre Sophie Germain, au rez-de-chaussée du bâtiment Alan Turing.
Maria Rossi will defend her phd thesis on Friday, November 17 at 15:00, in the Gilles Kahn room of the Alan Turing building.
Modern science of graphs has emerged the last few years as a field of interest and has been bringing significant advances to our knowledge about networks. Until recently the existing data mining algorithms were destined for structured/relational data while many datasets exist that require graph representation such as social networks, networks generated by textual data, 3D protein structures and chemical compounds. It has become therefore of crucial importance to be able to extract in an efficient and effective way meaningful infor- mation from that kind of data and towards this end graph mining and analysis methods have been proven essential. The goal of this thesis is to study problems in the area of graph mining focusing espe- cially on designing new algorithms and tools related to information spreading and specifi- cally on how to locate influential entities in real-world social networks. This task is crucial in many applications such as information diffusion, epidemic control and viral marketing.
In the first part of the thesis, we have studied spreading processes in social networks focusing on finding topological characteristics that rank entities in the network based on their influential capabilities. We have specifically focused on the K-truss decomposition which is an extension of the k-core decomposition of the graph. Both methods partition a graph into subgraphs whose nodes and/or edges have some common characteristics. For the case of the K-truss, the edges belonging to such subgraph are contained to at least K-2 triangles. After extensive experimental analysis in real-world networks, we showed that the nodes that belong to the maximal K-truss subgraph show a better spreading behavior when compared to baseline criteria such as degree and k-core centralities. Such spreaders have the ability to influence a greater part of the network during the first steps of a spreading process but also the total fraction of the influenced nodes at the end of the epidemic is greater. We have also observed that node members of such dense subgraphs are those achieving the optimal spreading in the network.
In the second part of the thesis, we focused on identifying a group of nodes that by acting all together maximize the expected number of influenced nodes at the end of the spreading process, formally called Influence Maximization. The Influence Maximization problem is actually NP-hard though there exist approximation guarantees for efficient algorithms that can solve the problem while obtaining a solution within the 63% of op- timal classes of models. As those guarantees propose a greedy approximation which is computationally expensive especially for large graphs, we proposed the MATI algorithm which succeeds in locating the group of users that maximize the influence while also be- ing scalable. The algorithm takes advantage of the possible paths created in each node’s neighborhood and precalculates each node’s potential influence and achieves to produce competitive results in quality compared to those of baseline algorithms.
In the last part of the thesis, we study the privacy point of view of sharing such metrics that are good influential indicators in a social network. We have focused on designing an algorithm that addresses the problem of computing through an efficient, correct, se- cure, and privacy-preserving algorithm the k-core metric which measures the influence of each node of the network. We have specifically adopted a decentralization approach where the social network is considered as a Peer-to-peer (P2P) system. The algorithm is built based on the constraint that it should not be possible for a node to reconstruct partially or entirely the graph using the information they obtain during its execution. While a distributed algorithm that computes once and for all the nodes’ coreness is already pro- posed, networks that evolve over time are not taken into account. Our main contribution is an incremental algorithm that efficiently solves the core maintenance problem in P2P while limiting the number of messages exchanged and computations. Through extensive experiments we provide a security and privacy analysis of the solution regarding network de-anonymization. We showed how it relates to previously defined attack models and discuss countermeasures.
The DaScIS (Data Science in the insurance Sector) Chair organizes the “Data Science in Action” event on November 16th in the campus of Ecole Polytechnique (amphi Gay Lussac ).
This half-day consists
- of presentations by senior managers and data scientists of the AXA Group, and important speakers from academia. These presentations provide insight into business issues and case studies of machine learning techniques or applying big data in the field of insurance.
- a social event - reception at 17:30
If you are going to attend please register your name and email - to this form. We need this for logistic reasons.
The full schedule of event follows below. I would like to stress among others the key note of Professor Jie Tang from Tsinghua University, Beijing with a very interesting talk on open academic data.
- 14:00 J. Biot, President Ecole Polytechnique
- 14:15 M. Vazigiannis, Chair Holder - "Presentation of the Chair Activities"
- 14:30 Gaelle Olivier - CEO of AXA Global P&C – “Strategic vision of Insurance and Data”
- 15:00 Richard Benjamins - Head of the AXA data innovation lab - "Role of AXA in Global Data community"
- 15:30 Coffee break
- 16:00 S. Guinet (Founding CEO of Kamet (AXA Strategic Ventures fund) – “Presentation of portfolio projects"
- 16:30 Jie Tang - Professor, Tsinghua University - "AMiner: toward understanding big science data"
- 17:00 Wrap up - Discussion - Marcin Detyniecki - AXA
- 17:30 Cocktail "Salle des Cadres"
In this talk, we review the notion of Differential Privacy, its implications in terms of Bayesian Adversary, and we discuss typical implementations from the point of view of their optimality with respect to utility. Then, we consider an extension of differential privacy to general metric domains, and the consequences for the optimality results. Finally, we show an instantiation to the case of location privacy, leading to the notion of geo-indistinguishability.
Henry de Lumley, membre correspondant de l’Académie des Sciences et de l’Académie des Inscriptions et Belles Lettres, préhistorien et géologue du Quaternaire, est Directeur de l’Institut de Paléontologie Humaine, Fondation Albert Ier Prince de Monaco, Professeur émérite et ex Directeur du Muséum National d’Histoire Naturelle, Paris.
L’objectif essentiel de ses recherches, sur le terrain et en laboratoire, est consacré à l’étude de l’origine de l’Homme, de sa première présence dans les grandes régions du monde, de son évolution morphologique et culturelle, de son comportement et de son mode de vie, au sein de ses paléoenvironnements (Ethiopie, Mauritanie, Inde, Turquie, Chine, Corée du Sud).
Une approche toute particulière de ses recherches a pour but de suivre l’évolution des climats et de la biodiversité tout au long des temps quaternaires dans diverses régions du monde.
Ces études sont conduites avec une approche interdisciplinaire : stratigraphie, sédimentologie, magnéto-stratigraphie et géochronologie, palynologie, paléontologie, paléoanthropologie, paléthnologie, études technologique et typologique d’industries lithiques du Paléolithique inférieur et moyen.
One postdoctoral researcher and one PhD student position are available at the LIX research laboratory of Ecole Polytechnique in France, in the area of:
«Exploring Relations in Structured Data with Functional Maps»
The goal of this project is to develop tools for the analysis and processing of complex geometric data, such as large-scale 3D model collections, by building on techniques from Computer Graphics, Geometry Processing and Machine Learning. The ultimate objective is to create a unified framework for representing and analyzing structured data to solve problems such as: shape comparison and matching, shape segmentation and labeling, shape deformation and recognition among others. The project will be funded, in part, by the ERC Starting Grant EXPROTEA.
The ideal candidates should have strong background in at least some of the following:
- Geometry Processing and 3D shape analysis
- Computer Graphics
- Computer Vision and Image Processing
- Knowledge of basic Differential Geometry
- Machine Learning
- Programming experience, especially in multimedia data analysis
For the postdoc position, candidates must have a PhD at the starting time of the post-doc, and should have publications in top venues in either Computer Graphics (e.g., SIGGRAPH, Eurographics, SGP) or a related field. The expected duration of the post-doc is 2 years.
For the PhD position, the candidate should have a Master's degree with preference given to those having strong research experience. The expected duration of the PhD is 3 years.
Venue and supervision
Successful candidates will be based at Ecole Polytechnique located in Palaiseau, Paris area, France. The project will be supervised by Maks Ovsjanikov (LIX, Ecole Polytechnique):
The expected salary is approximately 2400 € net for the post-doctoral position and approximately 1800 € net per month for the PhD position.
How to apply
To apply, please contact Maks Ovsjanikov (maks at lix.polytechnique.fr). In your email, please include your CV, a short cover letter and a publication list. In your cover letter, make sure to mention how your interests and experience relate to the topics in this announcement.
Reference letters are optional but may need to be provided upon request.