Our team
The Data Science and Mining (DaSciM) team is part of the Computer Science Laboratory (LIX) of École Polytechnique.In the previous years we have conducted research in the areas of databases and data mining. More specifically in unsupervised learning (clustering algorithms and validity measures), advanced data management and indexing (P2P systems, distributed indexing, distributed dimensionality reduction), text mining (word disambiguation for classification, introduced the Graph of Words approach) and ranking algorithms (temporal extensions to PageRank).More recently, we worked in large scale graph mining (degeneracy based community detection and evaluation), text mining and retrieval for web advertising/marketing and recommendations.
Current research interests and work include the areas:
- Generative AI and Foundation models including Text LMs (i.e. Barthez, AraBart) multi-modal ones (Prot2text), Graph Generative Models (i.e. Graph Neural Generator and Medical Graph Generator).
- Machine and deep learning for graphs (graph kernels, embedding methods, graph classification, large scale community detection) with applications in fraud detection, social networks, biomedical, recommendations, retail.
- NLP and text mining (Graph of Words, Deep learning for text classification, summarisation and keyword extraction, word sense disambiguation/induction, large scale multilingual language resources, knowledge distillation, evaluation for text generation methods.
- Event and anomaly detection in data streams and time series (applications in text streams, sensory data, personalised medicine).
- Structured output prediction (multi-label classification, multi-output and sequential/dynamical models, probabilistic models and neural networks).
- Reinforcement learning (Bayesian models, and deep learning).
The DaSciM team members have supervised fifteen completed Ph.D. theses and published chapters in books and encyclopedias, two international books and more than a 250 papers in international refereed journals and conferences. Also we have co-authored three patents and attracted significant R&D funding including national and international governmental/industrial sources. Members of our team have received the ERCIM, Marie Curie, and Google fellowships. Our team has co-organized the ECML PKDD 2011 conference in Athens, ECML/PKDD 2017 and participates in the senior organization of different AI and Data mining related events (AAAI, IJCAI).
Moreover our group has a long experience in real-world R&D projects in the area of Large Scale Data/Text/time series Mining. Currently we maintain collaborations with industrial partners (including AIRBUS, Google, BNP, Tencent, Tradelab) working on machine learning projects.
Professor Vazirgiannis lead the X/AXA Data Science Chair (2015-2018) and currently leads the ANR-HELAS chair on Deep Learning for heterogeneous data (graphs,text).
Also, check:
- A popularized post on Large Pre-trained Models and their potential on the Institute de Polytehcnique news page
- Our position paper on “AI and Future challenges” article in Annals des Mines journal offering a popularized presentation of AI challenges for the future.
Visit our selected publications page.
Team leader

[wp_getLatestNews]