SC-DBSCAN, Scalable Density-Based Clustering Algorithm for Schema Discovery In Large RDF Datasets

1 April 2019

Laboratory

DAVID-Université De Versailles Saint-Quentin en Yvelines

Goal

In this experiments, we study the scalability of our algorithm on clusters with various number of nodes.

DataSets

For this evaluation, we use synthetic datasets generated using IBM Quest Synthetic Data Generator.

Resources

2 to 16 worker nodes (Spark)

  • 12 CPU / node
  • RAM: 30GB / node
  • Disk: 100 GB / node

Software Used

  • Spark 2.4.0
  • Expected results :
  • Experimental evaluations should show the scalability of our clustering algorithm with respect to the number of nodes in a cluster and its ability to manage huge datasets.

Auteur:

()