The goal of this project is to implement secure approaches on multiple join, matrix multiplication, grouping and aggregation on Hadoop.
With the advent of big data, new techniques have been developed to process parallel computation on a large cluster. One of them is the MapReduce programming paradigm, which allows a user to keep data in public clouds and to do computations on it. Since the data is externalized, it can be communicated over an untrustworthy network and processed on some untrustworthy machines, where malicious public cloud users may learn private data. In this project, we address the fundamental problems of how to compute relational joins between an arbitrary number of relations, how to calculate matrix multiplication and how to group and aggregate data from a relation in a privacy-preserving manner using MapReduce.
Resources used (RAM CPU etc…)
- 16 x 2.4 GHz CPU, 32 GB RAM, and 320 GB disk.
- Ubuntu Server 14.04_Vanilla Hadoop-2.7.1
- JAVA 1.7.0
- Higgs Twitter Dataset:http://snap.stanford.edu/data/higgs-twitter.html.Accessed: 2018-02-20.
(1) Implement multiple join and grouping and aggregation with no security on Hadoop.
(2) Implement secure-private approach for join, grouping and matrix multiplication on Hadoop.
(3) Implement Order-Preserving-Encryption-Scheme for aggregation on Hadoop.
(4) Implement Collision-Resistant-Secure-Private approach for multiple join and matrix multiplication on Hadoop.
(5) Compare the trade-offs of those approaches with respect to three fundamental criteria: computation cost, communication cost, and privacy guarantees.