A Distributed Algorithm for Large Scale Graph Clustering

1 April 2019

Main goal

The goal of this work is to implement a distributed algorithm for large scale graph Clustering.


Graph clustering is one of the key techniques to understand the structures present in the graph data. In addition to clusters detection, the identification of hubs and outliers is also a critical task as it plays an important role in the analysis of graph data. Recently, several graph clustering algorithms have been proposed and used in many application domains such as biological network analysis, recommendation systems and community detection. Most of these algorithms are based on the structural clustering algorithm SCAN. In this work, we experiment our proposed distributed algorithm on top of BLADYG framework and Akka framework, and experimentally show the efficiency of DSCAN in the case of large networks.

Resources used

20 x 2.4 GHz CPU, 100 GB RAM, and 500 GB disk

Software used

  • AKKA
  • Java8
  • Metis
  • C++
  • ROOL graph generator

The performed analysis

  • Scalability

Dataset used

  • Real-world graph dataset from https://snap.stanford.edu/data/index.html
  • synthetic dataset generated by ROOL graph generator