• Ph.D.September 2015-present

    Ph.D. Student in Representations, Regularization and Visualization for text and graph data

    École Polytechnique, France

  • MSc2014-2015

    Master M2 in Math, Computer Vision and Machine Learning (MVA)

    École Normale Superieure de Cachan, France

  • BScMay 2013

    Bachelor in Computer Science (4-year curriculum)

    Athens University of Ecomonics and Business, Greece

© Konstantinos Skianis, powered by Bootstrap, last updated:


  • Natural Language Processing, Text Mining
  • Machine Learning, Deep Learning
  • Graph Mining, Data Science


  • 2017: AAAI, LLD workshop @NIPS, EMNLP
  • 2016: AAAI, WSDM, IC2S2


Orthogonal Matching Pursuit for Text Classification

Konstantinos Skianis, Nikolaos Tziortziotis, Michalis Vazirgiannis
Paper arXiv 2018


In text classification, the problem of overfitting arises due to the high dimensionality, making regularization essential. Although classic regularizers provide sparsity, they fail to return highly accurate models. On the contrary, state-of-the-art group-lasso regularizers provide better results at the expense of low sparsity. In this paper, we apply a greedy variable selection algorithm, called Orthogonal Matching Pursuit, for the text classification task. We also extend standard group OMP by introducing overlapping group OMP to handle overlapping groups of features. Empirical analysis verifies that both OMP and overlapping GOMP constitute powerful regularizers, able to produce effective and super-sparse models.

GraKeL: A Graph Kernel Library in Python

Giannis Siglidis, Giannis Nikolentzos, Stratis Limnios, Christos Giatsidis, Konstantinos Skianis, Michalis Vazirgiannis
Paper arXiv 2018


The problem of accurately measuring the similarity between graphs is at the core of many applications in a variety of disciplines. Graph kernels have recently emerged as a promising approach to this problem. There are now many kernels, each focusing on different structural aspects of graphs. Here, we present GraKeL, a library that unifies several graph kernels into a common framework. The library is written in Python and is build on top of scikit-learn. It is simple to use and can be naturally combined with scikit-learn's modules to build a complete machine learning pipeline for tasks such as graph classification and clustering. The code is BSD licensed and is available here.

Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification

Konstantinos Skianis, Fragkiskos Malliaros, Michalis Vazirgiannis
Workshop Paper TextGraphs, NAACL 2018, New Orleans, USA [Best Paper Award]


Contrary to the traditional Bag-of-Words approach, we consider the Graph-of-Words (GoW) model in which each document is represented by a graph that encodes relationships between the different terms. Based on this formulation, the importance of a term is determined by weighting the corresponding node in the document, collection and label graphs, using node centrality criteria. We also introduce novel graph-based weighting schemes by enriching graphs with word-embedding similarities, in order to reward or penalize semantic relationships. Our methods produce more discriminative feature weights for text categorization, outperforming existing frequency-based criteria.

Kernel Graph Convolutional Neural Networks

Giannis Nikolentzos, Polykarpos Meladianos, Antoine J.-P. Tixier, Konstantinos Skianis, Michalis Vazirgiannis
Conference Paper ICANN 2018


Graph kernels have been successfully applied to many graph classification problems. Typically, a kernel is first designed, and then an SVM classifier is trained based on the features defined implicitly by this kernel. This two-stage approach decouples data representation from learning, which is suboptimal. On the other hand, Convolutional Neural Networks (CNNs) have the capability to learn their own features directly from the raw data during training. Unfortunately, they cannot handle irregular data such as graphs. We address this challenge by using graph kernels to embed meaningful local neighborhoods of the graphs in a continuous vector space. A set of filters is then convolved with these patches, pooled, and the output is then passed to a feedforward network. With limited parameter tuning, our approach outperforms strong baselines on 7 out of 10 benchmark datasets.

Data Management for the World Wealth & Income Database

Christos Giatsidis, Antonis Skandalis, Konstantinos Skianis, Michalis Vazirgiannis
Facundo Alvaredo, Lucas Chancel, Thomas Piketty, Emmanuel Saez, Gabriel Zucman
Ben Grillet, Francois Prosper, Brice Terdjman, Anthony Veyssiere
Poster ParisBD 2017, Paris, France


The world wealth & income database ( is a project that publishes data related to inequality over the world. It is an open portal that distributes time series about economic concepts such as wealth, income, etc., where users can select time series of interest based on multi-attribute queries. Most of the components of the project are hosted on the cloud using Amazon Web Services. The data management functionality of the project consists of two major parts: a) a modern relational database that uses state of the art indexing techniques along with JSON features and b) a web API through which the database can be accessed and where some data transformations take place. To reduce latency across the globe the project is currently deployed in two sites (EU and US). Currently the database holds data for 319 geographical regions (countries, continents, states), about 150 combinations of attributes for each region over 50 years on average. The project is a joint collaboration between the World Inequality Lab at Paris School of Economics, DaSciM team at Laboratoire d’Informatique de l’X and WEDODATA.

SpreadViz: Analytics and Visualization of Spreading Processes in Social Networks

Konstantinos Skianis, Maria Evgenia G. Rossi, Fragkiskos D. Malliaros, Michalis Vazirgiannis
Demo Paper ICDM 2016, Barcelona, Spain


In this paper, we propose SpreadViz, a web tool for exploration and visualization of spreading properties in social networks. SpreadViz consists of three main modules, namely graph exploration and analytics, detection of influential nodes, and interactive visualization. More precisely, SpreadViz offers the following functionalities: (i) It computes and visualizes various centrality criteria towards understanding how the position of a node in the network affects its spreading properties; (ii) It offers a wide range of criteria for the detection of single and multiple influential nodes and comparison among them; (iii) It effectively visualizes the spread of influence in the network as well as the performance of each method. In our demonstration, we invite the audience to interact with SpreadViz, exploring, analyzing, and visualizing the spreading processes over various real-world social networks.

Regularizing Text Categorization with Clusters of Words

Konstantinos Skianis, Francois Rousseau, Michalis Vazirgiannis
Conference Paper EMNLP 2016, Austin, USA


Regularization is a critical step in any supervised learning problem and crucial for addressing not only overfitting, but also taking into account any prior knowledge we may have on the problem features and their relationships. In this paper we explore state-of-the-art structured regularizers for textual data and we propose novel ones based on topics from LSI and clusters from word2vec and graph-of-words document representation. We show that for text categorization our proposed regularizers are faster than the state-of-the-art ones while they improve classification accuracy.

GoWvis: A web application for Graph-of-Words-based text visualization and summarization

Antoine J.-P. Tixier, Konstantinos Skianis, Michalis Vazirgiannis
Demo Paper ACL 2016, Berlin, Germany


We introduce GoWvis, an interactive web application that represents any piece of text inputted by the user as a Graph-of-Words and leverages graph degeneracy and community detection to generate an extractive summary (keyphrases and paragraph) of the inputted text in an unsupervised fashion. The entire analysis can be fully customized via the tuning of many text preprocessing, graph building, and graph mining parameters. Our system is thus well suited to educational purposes, exploration and early research experiments. The new summarization strategy we propose also shows promise.

Graph-Based Term Weighting for Text Categorization

Fragkiskos Malliaros, Konstantinos Skianis
Workshop Paper Someris, ASONAM 2015, Paris, France


Text categorization is an important task with plenty of applications, ranging from sentiment analysis to automated news classification. In this paper, we introduce a novel graph-based approach for text categorization. Contrary to the traditional Bag-of-Words model for document representation, we consider a model in which each document is represented by a graph that encodes relationships between the different terms. The importance of a term to a document is indicated using graphtheoretic node centrality criteria. The proposed weighting scheme is able to meaningfully capture the relationships between the terms that co-occur in a document, creating feature vectors that can improve the categorization task. We perform experiments in well-known document collections, applying popular classification algorithms. Our preliminary results indicate that the proposed graph-based weighting mechanism is able to outperform existing frequency-based term weighting criteria, under appropriate parameter setting.

Learning for Text and Graph Data - 2017, Learning for Text and Graph Data (2016-2017)

Licence, Master M2

The courses aim at providing an introduction to advanced machine learning and combinatorial methods aiming at large scale text and graph data. The courses syllabus included:

  • Advanced graph kernels and classification,clustering / community mining (Louvain, modularity, degeneracy)
  • Influence maximization models (SIR/SIS, LT, IC,…), degeneracy based spreaders selection
  • Graph of words advanced topics: tw-icw, graph kernels for document similarity, graph based regularization for text classification
  • Word embeddings, Unsupervised document classification with the Word Mover’s Distance, WMD vs cosine similarity
  • Deep learning for NLP, Supervised document classification (TF-IDF vs TW-IDF)
  • Keyword extraction for summarization: Graph based keyword extraction, summarization (off line, online), Filipova’s word graph for multi-sentence fusion

  • April 2016

    2nd place, Fintech Crowdhackathon 2016

    National Bank of Greece

    Our team (RSK project) is the 2nd winner in the Fintech Crowdhackathon organized by the National Bank of Greece. We made a platform to detect fraud e-transactions based on Deep Learning. You can find more info here.

  • January 2015

    2nd place, Dreem challenge 2015

    Inclass Kaggle

    Trying to analyze dreams. During deep sleep, crucial mechanisms occur: memory consolidation, cellular regeneration, growth hormone release or biologic clock reset. Lacking deep sleep impairs memory, focus and judgment during work. DREEM introduces a way to increase the duration and quality of deep sleep to ensure optimal performances.

  • May 2013

    6th place, Data Mining Cup 2013


    Our team from the Department of Informatics, consisted of undergraduate students G.Papoutsakis, G.Zografos, G.Theofilis, myself and Phd students M. Karkali and S. Thomaidou, took the 6th place in the Data Mining Cup 2013 competition.

    The participations reached 99 out of 77 universities all over the world.

  • April 2013

    Top 25%, Employee Access Challenge 2013


    The objective of this competition is to build a model, learned using historical data, that will determine an employee's access needs, such that manual access transactions (grants and revokes) are minimized as the employee's attributes change over time. The model will take an employee's role information and a resource code and will return whether or not access should be granted.

Laboratoire d'Informatique (LIX), École Polytechnique
Batiment Alan Turing, 1 Rue Honore d'Estienne d'Orves
Campus de l'Ecole Polytechnique
91120 Palaiseau, France
Office 1071
  • kskianis at
  • rob.cs.aueb at
  • kostas.skianis
  • #kskianis
  • My LinkedIn profile