Data Science for Big data

UE :Data Science for Big data – Science des données pour le Big Data
Responsable(s), établissement(s) Michalis Vazirgiannis, Ecole Polytechnique
Mauro Sozio – Telecom Paristech
Adresse(s) mail mvazirg@lix.polytechnique.fr
sozio@telecom-paristech.fr
Lieu principal d’enseignement Ecole Polytechnique
ECTS 2.5
Nombre d’heures total 21
Cours 12
TD 6
TP 3
Objectifs To acquaint the students with algorithms, methods and techniques for the life cycle of a data science project i.e. the iterative and incremental approach to make sense of the data (structured, graph, text) around the following key components: Data engineering and Data analysis.
This includes data pre-processing and cleaning, feature extraction and creation, supervised and non-supervised learning methods for potentially Big data.
Prérequis Data Bases, Algorithms, Probability/Statistics, Programming
Syllabus Data engineering
  • (Big) data management (RDBM vs. (NoSQL, MapReduce, HIVE))
  • (Big) data pre processing (data cleaning, normalization, feature selection & creation, spectral decompositions and dimensionality reduction)

Data analysis

  • Descriptive (data quality)
  • Exploratory (summary statistics, correlation, ANOVA)
  • Inferential (theory of generalization, sampling, statistical testing)
  • Predictive (supervised, unsupervised machine learning)
  • sequence text, graph – mining

Case Studies (from data mining cups or Kaggle)

Language English