Yanlei Diao

Professor
Laboratoire d'informatique / Department of Computer Science
Ecole Polytechnique, France

Email:{first-name} dot {last-name}@polytechnique.edu
Phone:+33 1 77 57 80 13
Address: Batiment Alan Turing
1 rue Honore d'Estienne d'Orves
Campus de l'Ecole Polytechnique
91120 Palaiseau, France
Lab Mgr:Jessica Gameiro {first} dot {last}@polytechnique.edu

[Home]  [Research]  [Funding]  [Awards]  [Publications]  [Teaching]  [Service]  [Students


Selected Publications

2023

Efficient Version Space Algorithms for Human-in-the-Loop Model Development. Luciano Di Palma, Yanlei Diao, and Anna Liu. ACM Transactions on Knowledge Discovery from Data (TKDD), accepted, 2023. (tech report)

Efficient and Robust Active Learning Methods for Interactive Database Exploration. Enhui Huang, Yanlei Diao, Anna Liu, Liping Peng, and Luciano Di Palma. The VLDB Journal, November 2023. (online version)

Explainable Anomaly Detection in Low-dimensional Axis-aligned Projections. Fei Song, Yanlei Diao, Madalina Fiterau. Technical report, 2023.

2022

Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing. Chenghao Lyu, Qi Fan, Fei Song, Arnab Sinha, Yanlei Diao, Wei Chen, Li Ma, Yihui Feng, Yaliang Li, Kai Zeng, Jingren Zhou. Proceedings of VLDB Endowment (PVLDB), August 2022. (pdf) (longer version)

2021

Exathlon: A Benchmark for Explainable Anomaly Detection over Time Series. Vincent Jacob, Fei Song, Arnaud Stiegler, Bijan Rad, Yanlei Diao, and Nesime Tatbul. Proceedings of VLDB Endowment (PVLDB), August 2021. (pdf)

A Demonstration of the Exathlon Benchmarking Platform for Explainable Anomaly Detection. Vincent Jacob, Fei Song, Yanlei Diao, and Nesime Tatbul. International Conference on Very Large Databases (VLDB), demonstration track, 2021.

Explainable Anomaly Detection on High-Dimensional Time Series Data. Bijan Rad, Fei Song, Vincent Jacob and Yanlei Diao. Proceedings of ACM International Conference on Distributed and Event-based Systems (DEBS), 2021. (pdf)

Explainable Anomaly Detection on High-Dimensional Time Series Data. Bijan Rad, Fei Song, Vincent Jacob and Yanlei Diao. Proceedings of ACM International Conference on Distributed and Event-based Systems (DEBS), 2021.

Efficient Exploration of Interesting Aggregates in RDF Graphs. Yanlei Diao, Paweł Guzewicz, Ioana Manolescu, Mirjana Mazuran. Proceedings of ACM Conference on Management of Data (SIGMOD), 2021. (pdf)

Spark-based Cloud Data Analytics using Multi-Objective Optimization. Fei Song, Khaled Zaouk, Chenghao Lyu, Arnab Sinha, Qi Fan, Yanlei Diao, Prashant Shenoy. Proceedings of IEEE International Conference on Data Engineering (ICDE), 2021. (pdf)

2020

Efficient Version Space Algorithms for" Human-in-the-Loop" Model Development.. Luciano Palma, Yanlei Diao, Anna Liu. Proceedings of the IEEE International Conference on Data Mining (ICDM), 2019. (pdf)

Neural-based Modeling for Performance Tuning of Spark Data Analytics.. Khaled Zaouk, Fei Song, Chenghao Lyu, Yanlei Diao. arXiv preprint arXiv:2101.08167. (pdf)

Boosting Cloud Data Analytics using Multi-Objective Optimization.. Fei Song, Khaled Zaouk, Chenghao Lyu, Arnab Sinha, Qi Fan, Yanlei Diao, Prashant Shenoy. arXiv preprint arXiv:2005.03314. (pdf)

2019

Spade: A Modular Framework for Analytical Exploration of RDF Graphs.. Yanlei Diao, Pawel Guzewicz, Ioana Manolescu, Mirjana Mazuran. Proceedings of Very Large Databases (PVLDB), 12(12): 1926-1929 (2019) (pdf)

UDAO: A Next-Generation Unified Data Analytics Optimizer.. Khaled Zaouk, Fei Song, Chenghao Lyu, Arnab Sinha, Yanlei Diao, Prashant J. Shenoy. Proceedings of Very Large Databases (PVLDB), 12(12): 1934-1937 (2019) (pdf)

A Factorized Version Space Algorithm for "Human-In-the-Loop" Data Exploration.. Luciano Di Palma, Yanlei Diao, Anna Liu. Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 1018-1023, 2019. (pdf)

2018

Optimization for Active Learning-based Interactive Database Exploration.. Enhui Huang, Liping Peng, Luciano Di Palma, Ahmed Abdelkafi, Anna Liu, Yanlei Diao. Proceedings of Very Large Databases (PVLDB), 12(1): 71-84 (2018) (pdf)

Anomaly Detection and Explanation Discovery on Event Streams. Fei Song, Boyao Zhou, Quan Sun, Wang Sun, Shiwen Xia, and Yanlei Diao. In the Proceedings of the International Workshop on Real-time Business Intelligence and Analytics (BIRTE), VLDB, 5:1-5:5, 2018. (pdf)

EXAD: A System for Explainable Anomaly Detection on Big Data Traces. Fei Song, Yanlei Diao, Jesse Read, Arnaud Stiegler, and Albert Bifet. In the Proceedings of the IEEE International Conference on Data Mining (ICDM), November 2018. (pdf)

2017

An Analysis of Query-Agnostic Sampling for Interactive Data Exploration. Wenzhao Liu, Yanlei Diao, and Anna Liu. Communications in Statistics – Theory and Methods. Accepted, July 2017. (pdf)

Massively Parallel Processing of Whole Genome Sequence Data: An In-Depth Performance Study. Abhishek Roy, Yanlei Diao, Uday Evani, Avinash Abhyankar, Clinton Howarth, Rémi Le Priol, and Toby Bloom. SIGMOD 2017. (pdf)

XStream: Explaining Anomalies in Event Stream Monitoring. Haopeng Zhang, Yanlei Diao, Alexandra Meliou. EDTB, 2017. (pdf) (longer version)

Dagger: Digging for Interesting Aggregates in RDF Graphs.. Yanlei Diao, Ioana Manolescu, Shu Shan. Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks, pp. 1-4, co-located with 16th International Semantic Web Conference (ISWC 2017), October, 2017.

Dagger: Digging for Interesting Aggregates in RDF Graphs.. Yanlei Diao, Ioana Manolescu, Shu Shan. Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks, pp. 1-4, co-located with 16th International Semantic Web Conference (ISWC 2017), October, 2017.

2016

Interactive Data Exploration via Machine Learning Models. Olga Papaemmanouil, Yanlei Diao, Kyriaki Dimitriadou, Liping Peng. IEEE Data Engineering Bulletin, 39(4), 38-49, December 2016.

AIDE: An Active Learning-based Approach for Interactive Data Exploration. Kyriaki Dimitriadou, Olga Papaemmanouil, and Yanlei Diao. IEEE Transactions on Knowledge and Data Engineering (TKDE), 28(11), 2842-2856, November 2016. (pdf)

2015

Supporting Scalable Analytics with Latency Constraints. Boduo Li, Yanlei Diao, and Prashant Shenoy. VLDB 2015. (pdf)

AIDE: An Automatic User Navigation System for Interactive Data Exploration. Yanlei Diao, Kyriaki Dimitriadou, Zhan Li, Wenzhao Liu, Olga Papaemmanouil, Kemi Peng, and Liping Peng. VLDB 2015. Demo. (pdf)

Supporting Data Uncertainty in Array Databases. Liping Peng and Yanlei Diao. SIGMOD 2015. (pdf)

Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis. Abhishek Roy, Yanlei Diao, and Toby Bloom. CIDR 2015. (pdf)

2014

Explore-by-Example: An Automatic Query Steering Framework for Interactive Data Exploration. Kyriaki Dimitriadou, Olga Papaemmanouil, and Yanlei Diao. SIGMOD 2014, pp. 517-528. (pdf)

On Complexity and Optimization of Expensive Queries in Complex Event Processing. Haopeng Zhang, Yanlei Diao, and Neil Immerman. SIGMOD 2014, pp. 217-228. (pdf)

Interactive data exploration based on user relevance feedback. Kyriaki Dimitriadou, Olga Papaemmanouil and Yanlei Diao. Proceedings of the 30th International Conference on Data Engineering Workshops, ICDE, Chicago, IL, USA, March 31 - April 4, 2014.

2013

Recognizing Patterns in Streams with Imprecise Timestamps. Haopeng Zhang and Yanlei Diao. Journal of Information Systems, Elsevier, 38(8), pp. 1187-1211, November 2013. (pdf)

SEDGE: Symbolic Example Data Generation for Dataflow Programs. Kaituo Li, Christoph Reichenbach, Yannis Smaragdakis, Yanlei Diao, and Christoph Csallner. Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 235-245, November 2013.

Supporting User-Defined Functions on Uncertain Data. Thanh T. L. Tran, Yanlei Diao, Charles Sutton, and Anna Liu. PVLDB, 6(6), pp. 469-480, August 2013. (pdf)

Query Steering for Interactive Data Exploration. U. Cetintemel, M. Cherniack, J. DeBrabant, Y. Diao, K. Dimitriadou, A. Kalinin, O. Papaemmanouil, S. Zdonik. Proceedings of 6th Biennial Conference in Innovative Data Systems Research (CIDR 2013). Asilomar, CA, USA: 2013. (pdf)

2012

SCALLA: A Platform for Scalable One-pass Analytics using MapReduce. Boduo Li, Ed Mazur, Yanlei Diao, Andrew McGregor, and Prashant Shenoy. ACM TODS, 37(4):27, 27-64, December 2012. Special Issue on Best Papers of SIGMOD 2011. (pdf)

CLARO: Modeling and Processing Uncertain Data Streams. Thanh T. L. Tran, Liping Peng, Yanlei Diao, Andrew McGregor, and Anna Liu. VLDB Journal, 21(5):651-676, November 2012. (pdf)

Massive Genomic Data Processing and Deep Analysis. Abhishek Roy, Yanlei Diao, Evan Mauceli, Yiping Shen, and Bai-Lin Wu. PVLDB, 5(12): 1906-1909, August 2012. (pdf)

SPIRE: Efficient Data Inference and Compression over RFID Streams. Yanming Nie, Richard Cocci, Zhao Cao, Yanlei Diao, and Prashant Shenoy. TKDE, 24(1): 141-155, January 2012. (pdf)

2011

Optimizing Probabilistic Query Processing on Continuous Uncertain Data. Liping Peng, Yanlei Diao, and Anna Liu. PVLDB 2011, 4(11), 1169-1180, August, 2011. (pdf)

A Platform for Scalable One-pass Analytics using MapReduce. Boduo Li, Ed Mazur, Yanlei Diao, Andrew McGregor, and Prashant Shenoy. SIGMOD 2011, pp. 985-996. (pdf) (tech report)

Towards Scalable One-Pass Analytics Using MapReduce. Ed Mazur, Boduo Li, Yanlei Diao, and Prashant Shenoy. DataCloud 2011. (pdf) (tech report)

Distributed Inference and Query Processing for RFID Tracking and Monitoring. Zhao Cao, Charles Sutton, Yanlei Diao, and Prashant Shenoy. PVLDB 2011, 4(5), 326-337, February 2011. (pdf)

Quality-Biased Ranking of Web Documents. Michael Bendersky, W. Bruce Croft, and Yanlei Diao. WSDM 2011. Selected for oral presentation. (pdf)

2010

Recognizing Patterns in Streams with Imprecise Timestamps. Haopeng Zhang, Yanlei Diao, and Neil Immerman. VLDB 2010. (pdf)

Conditioning and Aggregating Uncertain Data Streams: Going Beyond Expectations. Thanh Tran, Andrew McGregor, Yanlei Diao, Liping Peng, and Anna Liu. VLDB 2010. (pdf)

PODS: A New Model and Processing Algorithms for Uncertain Data Streams. Thanh Tran, Liping Peng, Boduo Li, Yanlei Diao and Anna Liu. SIGMOD 2010. (pdf)

Exploiting the Interplay Between Memory and Flash Storage In Embedded Sensor Devices. Devesh Agrawal, Boduo Li, Zhao Cao, Deepak Ganesan, Yanlei Diao, Prashant Shenoy. IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2010. (pdf)

Yanlei Diao and Michael J. Franklin. High-Performance XML Message Brokering. Book Chapter. Data Stream Management: Processing High-Speed Data Streams, edited by Garofalakis, Gehrke, and Rastogi. Springer Data-Centric Systems and Applications Series, November 2010.

2009

Lazy-Adaptive Tree: An Optimized Index Structure for Flash Devices. Devesh Agrawal, Deepak Ganesan, Ramesh Sitaraman, Yanlei Diao, Shashi Singh. VLDB 2009. (pdf)

Probabilistic Inference over RFID Streams in Mobile Environments. Thanh Tran, Charles Sutton, Richard Cocci, Yanming Nie, Yanlei Diao, Prashant Shenoy. ICDE 2009. (pdf) (tech report)

Capturing Data Uncertainty in High-Volume Stream Processing. Yanlei Diao, Boduo Li, Anna Liu, Liping Peng, Charles Sutton, Thanh Tran, Michael Zink. CIDR 2009. (pdf)

Refining Keyword Queries for XML Retrieval by Combining Content and Structure. Desislava Petkova, W. Bruce Croft, Yanlei Diao. ECIR 2009. (pdf)

Architectural Considerations for Distributed RFID Tracking and Monitoring. Zhao Cao, Yanlei Diao, and Prashant Shenoy. NetDB 2009. (pdf)

Fast Packet Pattern Matching Algorithms. Fang Yu, Yanlei Diao, Randy Katz, T. V. Lakshman. Book Chapter. Algorithms for Next Generation Network Architecture, edited by Graham Cormode and Marina Thottan. Springer Computer Communications and Networks Series, October 2009.

2008

Efficient Pattern Matching over Event Streams. Jagrati Agrawal, Daniel Gyllstrom, Yanlei Diao, and Neil Immerman. SIGMOD 2008. (pdf) (tech report) (ppt)

On Supporting Kleene Closure over Event Streams. Daniel Gyllstrom, Jagrati Agrawal, Yanlei Diao, and Neil Immerman. ICDE 2008. Short paper. (pdf)

Efficient Data Interpretation and Compression over RFID Streams. Richard Cocci, Thanh Tran, Yanlei Diao, and Prashant Shenoy. ICDE 2008. Short paper. (pdf) (tech report)

Publish/Subscribe over Streams. Yanlei Diao and Michael Franklin. Article. To appear in Encyclopedia of Database Systems. (pdf)

XML Publish/Subscribe. Yanlei Diao and Michael Franklin. Article. To appear in Encyclopedia of Database Systems. (pdf)

2007

Re-thinking Data Management for Storage-centric Sensor Networks. Yanlei Diao, Deepak Ganesan, Gaurav Mathur, and Prashant Shenoy. In Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR 2007) , Asilomar, CA, January 2007. (pdf)

SASE: Complex Event Processing over Streams. Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, and Gordon Anderson. In Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR 2007) , Asilomar, CA, January 2007. Demo proposal. (pdf)

SPIRE: Scalable Processing of RFID Event Streams. Richard Cocci, Yanlei Diao, and Prashant Shenoy. In Proceedings of the 5th RFID Academic Convocation, April 2007. (pdf)

SASE+: An Agile Language for Kleene Closure over Event Streams. Yanlei Diao, Neil Immerman, and Daniel Gyllstrom. UMass Technical Report 07-03. (pdf)

2006

High-Performance Complex Event Processing over Streams. Eugene Wu, Yanlei Diao, and Shariq Rizvi. SIGMOD 2006, June 2006. (pdf) (ppt)

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection. Fang Yu, Zhifeng Chen, Yanlei Diao, T.V. Lakshman, and Randy H. Katz. In Proceedings of ACM / IEEE Symposium on Architectures for Networking and Communications Systems (ANCS 2006) , San Jose, CA, December 3-5, 2006. (pdf)


Before 2006

  Yanlei Diao. Query Processing for Large-Scale XML Message Brokering. PhD Dissertation. August, 2005. ACM SIGMOD Dissertation Award Honorable Mention (pdf)

  YFilter 1.0 code release. October 2004.  

 Yanlei Diao, Shariq Rizvi, and Michael J. Franklin. Towards an Internet-Scale XML Dissemination Service. In Proceedings of VLDB2004, August 2004. (pdf) (ppt)

 Yanlei Diao, Daniela Florescu, Donald Kossmann, Michael J. Carey, and Michael J. Franklin. Implementing Memoization in a Streaming XQuery Processor. In Proceedings of the 2nd International XML Database Symposium (XSym2004), August 2004. (pdf)

 Yanlei Diao, Michael J. Franklin. Query Processing for High-Volume XML Message Brokering. In Proceedings of VLDB 2003 , September 2003. (pdf) (ppt)

 Yanlei Diao, Mehmet Altinel, Michael J. Franklin, Hao Zhang, Peter Fischer. Path Sharing and Predicate Evaluation for High-Performance XML Filtering. ACM TODS , December 2003. (pdf)

 Yanlei Diao, and Michael J. Franklin. High-Performance XML Filtering: An Overview of YFilter. IEEE Data Engineering Bulletin , March, 2003. (pdf)

 Yanlei Diao, Peter Fischer, Michael Franklin, Raymond To. YFilter: Efficient and Scalable Filtering of XML Documents. Demo paper. In Proceedings of ICDE 2002, February 2002. (pdf)

 Yanlei Diao, Hongjun Lu, Songting Chen, Zengping Tian. Toward Learning Based Web Query Processing. In Proceedings of VLDB 2000, September 2000. (pdf)(ppt

 Songting Chen, Yanlei Diao, Hongjun Lu, Zengping Tian. FACT: A Learning Based Web Query Processing System. Demo paper. In Proceedings of SIGMOD 2000, May 2000. (ppt

 Yanlei Diao, Hongjun Lu, Dekai Wu. A Comparative Study of Classification Based Personal E-mail Filtering. In Proceedings of PAKDD 2000, April 2000. (ps) (pdf)