Databases and Big Data Management

Databases and Big Data Management

Description

Data have become essential for business and research purposes. Be it customer records, Web pages or movie reviews, there is a real demand for efficient data storage, indexing, querying and even mining. The field of databases is still an ongoing research problem and is constantly renewed, in particular due to the explosion of available data, its unprecedented scale and its heterogeneous nature.

In this course, you will learn the theory and concepts behind database management and gain practical experience in designing and managing scalable databases for traditional relational data but also large-scale data such as text and user-item interactions. The course will be divided into three parts as follows:

  1. The first part will be devoted to Relational Database Management Systems (RDBMS) and will introduce key concepts such as relational model, SQL, database normalisation/design and query optimization.
  2. The second part of the course will cover unstructured data and big data management and will show in practice how search engines work, in particular at Web-scale through hands-on experience on emerging technologies such as MapReduce, HBase and Hive.
  3. Finally, the last two lectures will go beyond data storage and data querying and will give a glimpse of data mining, leveraging the data to acquire further insights and knowledge to improve the quality of the offered services.

Logistics

  1. The course will take place on Tuesdays from September 16 until December 1st (no courses on 28/10 and 11/11), and will be divided into nine 4-hour sessions (+ final exam on Dec. 1st) of teaching (13:30 – 15:15) in Amphitheater Poisson and the lab session (15:30 -18:00) will be split between Amphitheater Poisson (students surname A-M) and PC n°18 (students name N-Z).
  2. Due to the high number of enrolled students, labs will not take place in rooms equipped with workstations. Therefore, the students are expected to come in class with their laptops (preferably with a Unix environment like Linux or Mac OS X for compatibility reasons). We will be using Java, MySQL, Matlab, Hadoop, HBase, Hive among others (to be installed locally on the laptops). Some labs will only require pen and paper so come equipped!
  3. We will be using the e-learning platform Moodle to share the course materials (slides and lab statements) and to upload assignments. Therefore it is imperative for students to enroll (available once logged in in Moodle with their @polytechnique.edu account using the enrollment key specified in the welcoming email they received). Additionally, the forum should be used to communicate with the staff following the guidelines.

Click this link for more information: https://moodle.polytechnique.fr/enrol/index.php?id=3.