Data challenges

Data Challenges

  1. Predicting missing links in a citation network (January 2017)

    A citation network is represented as a graph G(V,E) where V is the set of nodes and E is the set of edges (links). Each node corresponds to a paper and the existence of an edge between two nodes u and v means that one of the papers cites the other. Each node is associated with information such as the title of the paper, publication year, author names and a short abstract. A number of edges have been randomly deleted from the original citation network. The mission is to accurately reconstruct the initial network using graph-theoretical and textual features, and possibly other information. The challenge is organized for the “Text Mining and NLP” 2016-2017 course offered to the M1 Polytechnique students.

    Challenge link:

  2. Email recipient recommendation (January 2017)

    It was shown that at work, employees frequently forget to include one or more recipient(s) before sending a message. Conversely, it is common that some recipients of a given message were actually not intended to receive the message. To increase productivity and prevent information leakage, the needs for effective email recipient recommendation systems are thus pressing. In this challenge, we asked the MVA 2016-2017 students to develop such a system, which, given the content and the date of a message, recommends a list of 10 recipients ranked by decreasing order of relevance.

    Challenge link:

  3. Link Prediction Data Challenge (March 2016)

    In this competition, we define a citation network as a graph where nodes are research papers and there is an edge between two nodes if one of the two papers cite the other. Edges have been deleted at random from a citation network. The mission is to accurately reconstruct the initial network using graph, textual, and other information. The challenge was organized for the “Graph and Text Mining” course offered to the MVA/Data Science M2 master programs.

    Challenge link:

  4. AXA 2016 Data Challenge (February 2016)

    The DASCIS chair launched successfully the 1st data challenge based on data and requirements provided by AXA. It was announced to the students of the INF582 course.

    The specific challenge aimed at developing models for an inbound call forecasting system. The forecasting system should be able to predict the number of incoming calls for the AXA Assistance call center in France, on a per “half-hour” time slot basis. The prediction is for three (3) days ahead in time. The specific dataset includes telephony data retrieved from AXA call centers, and corresponds to the the period spanning the calendar years 2011 and 2012. The full description of the data challenge can be found here:

  5. Opinion Mining Data Challenge (May 2015)

    As a part of the Professional training program “DSSP”, this challenge functions as an education tool through the application of text mining and data mining techniques for the task of Opinion Mining. The target goal in to produce a classifier that can identify comments in reviews as either positive or negative. The students are also guided to utilize Big Data Technologies (Spark,Hadoop) for the completion of this challenge.