Projects
OpenLLM aims to advance the state of the art in pragmatic and resource-efficient LLM development. The project explores the potential of smaller, specialized models (~1.5B parameters) trained on high-quality data, demonstrating that they can compete with much larger systems.
LLM4ALL develops open, updatable, and efficient multilingual language models. DaSciM contributes to low-cost training and inference techniques, such as quantization and transfer learning. The project applies these models to real-world tasks like meeting summarization and hospital emergency call understanding.
HELAS explores advanced deep learning for graphs and NLP. DaSciM led work on hybrid neural architectures combining graph-based structures with language data, enabling more expressive and interpretable models for knowledge extraction, recommendation, and search.
XCOVIF tackled predictive modeling during the COVID-19 pandemic. DaSciM developed models leveraging mobility data, social media, and language trends to anticipate epidemic dynamics and inform public health interventions.
ANR SUMRE
SUMRE addresses automatic summarization of multi-party meetings. The DaSciM team focuses on hybrid summarization techniques combining extractive and abstractive methods, with attention to dialogue structure and discourse relations to generate high-quality meeting notes.
Esigma developed scalable and interpretable graph mining techniques. DaSciM worked on mining meaningful patterns and anomalies from large-scale graphs, applicable in scientific data, social networks, and security domains.
Linto (2018–2021) – BPI-France
Linto created intelligent meeting assistants capable of understanding, summarizing, and recommending content. DaSciM provided the NLP backbone, developing the summarization models and real-time recommendations enhancements.
OpenPaas (2016–2019) – BPI-France
This project extended the OpenPaas collaborative platform with automated summarization services for meetings. DaSciM’s contribution centered on designing extractive and abstractive models tailored to business and collaborative environments.
This chair investigated data science applications in the insurance industry. DaSciM designed machine learning pipelines for fraud detection, client segmentation, and policy optimization, with an emphasis on interpretability and fairness in data-driven decision-making.