Software & datasets

Software & datasets

  1. GraKeL

    GraKeL is a Python package extension, for the study and use of an upcoming area in data-mining and machine learning, known as graph kernels.

    Project is currently under alpha development stage and is uploaded on pypi-test.

    Code: https://github.com/ysig/GraKeL/tree/develop.
    Documentation: https://ysig.github.io/GraKeL/dev/.
    Paper: https://arxiv.org/abs/1806.02193.


  2. Graph-of-Words and graph-based keyword extraction

    A fully unsupervised, extractive text summarization system that leverages a submodularity framework. It allows summaries to be generated in a greedy way while preserving near-optimal performance guarantees. This tool builds on the graph-of-words representation of text and the k-core decomposition algorithm to assign meaningful scores to words.

    Prototype link here.

    Code available here.

    Relevant Papers:

    • A. J.-P. Tixier, P. Meladianos, M. Vazirgiannis, “Combining Graph Degeneracy and Submodularity for Unsupervised Extractive Summarization”, EMNLP 2017, Copenhagen, Denmark.
    • A. J.-P. Tixier, K. Skianis, M. Vazirgiannis, “GoWvis: A web application for Graph-of-Words-based text visualization and summarization”, ACL 2016, Berlin, Germany.
    • F. Rousseau and M. Vazirgiannis., “Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction”. ECIR 2015,
      Vienna, Austria.

  3. Graph-of-Words visualization tool

    GoWvis is an interactive web application that represents any piece of text inputted by the user as a Graph-of-Words and leverages graph degeneracy and community detection to generate an extractive summary (keyphrases and paragraph) of the inputted text in an unsupervised fashion. The entire analysis can be fully customized via the tuning of many text preprocessing, graph building, and graph mining parameters. Our system is thus well suited to educational purposes, exploration and early research experiments.

    Prototype link here.

    Relevant Papers:

    • A. J.-P. Tixier, K. Skianis, M. Vazirgiannis, “GoWvis: A web application for Graph-of-Words-based text visualization and summarization”, ACL 2016, Berlin, Germany.

  4. Degeneracy based graph mining

    Prototype link here.

    Relevant Papers:

    • C. Giatsidis, D. Thilikos, M. Vazirgiannis, “Evaluating cooperation in communities with the k-core structure”, in the proceedings of the 2011 IEEE International Conference on Data Mining series (ICDM) , Canada.
    • C. Giatsidis, D. Thilikos, M. Vazirgiannis, “D-cores: Measuring Collaboration of Directed Graphs Based on Degeneracy”, in the proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Taiwan.

  5. Match-the-News: Personalized news recommendation

    Google Chrome plugin link here.

    Relevant papers:

    • M. Karkali, V. Plachouras, C. Stefanatos, M. Vazirgiannis, “Keeping Keywords Fresh: A BM25 Variation for Personalized Keyword Extraction” will appear in the proceedings of the WWW2012 – 2nd Temporal Web Analytics Workshop,

  6. Google News Dataset

    Dataset link here.

    Our Google News Dataset consist of Google news manually classified, used in the paper “Efficient Online Novelty Detection in News Streams”.

    Relevant papers:

    • M. Karkali, F. Rousseau, A. Ntoulas, M. Vazirgiannis, “Efficient Online Novelty Detection in News Streams”, Web Information Systems Engineering – WISE 2013