distributed friends recommendation system design

7
Distributed Friends Recommendation System Design Dehao Li, Zeyu Zhang, Ziyi Liu

Upload: others

Post on 22-Nov-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Distributed Friends Recommendation System Design

Dehao Li, Zeyu Zhang, Ziyi Liu

Introduction

● A distributed friends recommendation system that can run on local Hadoop and AWS

Hadoop.

● Input: User relationship data (friends list of each user)

● Output: 10 recommended people for each user

● Compare time efficiency when it is run on different environment.

● Consists of 2 MapReduce jobs, each of which contains 1 map process and 1 reduce process.

Architecture of local system

● Hadoop 3.2.1 binary version.

● HDFS: 1 namenode, 2 datanodes.

● A MacBook acts as the master machine, and hosts a namenode and a datanode.

● A Ubuntu virtual machine acts as a worker, and hosts another datanode.

Architecture of cloud system

● Implemented on AWS using EMR.

● 1 master node and 2 worker nodes using m5.xlarge.

● Default configurations

Application specifications

Testing & Evaluation

● Cloud cluster can be much more powerful

● Difference is more significant if computation is heavy

● Job1 is heavier than Job2, reduce of Job1 is the most heavy one

What can we do if we know this?

1) We can separate Job1 with Job2. Job1, the heavier task, can be placed on the cloud cluster and Job2, the lighter task, can run locally.

1) We can maintain two user relationship datasets. Cloud: Complete datasetLocol: Partial dataset only for active user. Update frequently.

Thanks for listening.