SC-DBSCAN, Scalable Density-Based Clustering Algorithm for Schema Discovery In Large RDF Datasets

1 April 2019


DAVID-Université De Versailles Saint-Quentin en Yvelines


In this experiments, we study the scalability of our algorithm on clusters with various number of nodes.


For this evaluation, we use synthetic datasets generated using IBM Quest Synthetic Data Generator.


2 to 16 worker nodes (Spark)

  • 12 CPU / node
  • RAM: 30GB / node
  • Disk: 100 GB / node

Software Used

  • Spark 2.4.0
  • Expected results :
  • Experimental evaluations should show the scalability of our clustering algorithm with respect to the number of nodes in a cluster and its ability to manage huge datasets.