snfs: the design and implementation of a social network file system
DESCRIPTION
SNFS: The design and implementation of a Social Network File System. Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras. Shameless plug. If interested, please check out eXO: Decentralized Autonomous Scalable Social Networking , - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/1.jpg)
SNFS: The design and implementation
of a Social Network File System
Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou
University of Patras
![Page 2: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/2.jpg)
Shameless plug..
If interested, please check out eXO: Decentralized Autonomous Scalable Social Networking, 5th Conference on Innovative Data Systems Research (CIDR2011), 2011.
![Page 3: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/3.jpg)
Social Networks
Social Networks
Our Take:
1.Search for•People (friends, experts, …)•Content (books, photos, videos, blogs, websites, …)
2.Form entities (collections)•Friends-lists, content-libs
3.Search for•entities•Using previously-formed collections…
4.SNFS currently provides the foundation for these…
![Page 4: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/4.jpg)
Tagging
Tag 1Tag 2Tag 3Tag 4Tag 5
Profiles: sets of tags describing
entities.
“Search for”: •based on profiles.•Ranked retrieval (top-k)
![Page 5: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/5.jpg)
Current State
5,000,000,000 photos3,000 photos/min (as of September 2010)
2,000,000,000 videos served up each day(May 2010)
600,000,000 monthly active users (January 2011)
15,000,000 books (October 2010)130,000,000 by the end of the decade
![Page 6: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/6.jpg)
Current State
Need to access published content22,750,000,000 queries in search engines4,000,000,000 queries in YouTube351,000,000 queries in Facebook416,000,000 queries in MySpace(U.S. market figures, December 2009)
?
![Page 7: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/7.jpg)
Current State
How do I findstuff I want?
How do I provideintresting objects
to my users?
![Page 8: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/8.jpg)
Proposal
A content-awarefile system
for Social NetworkSystems
Usefull to users... ... And service providers too!
![Page 9: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/9.jpg)
Previous Work on File Indexing1991 – Semantic File Systems by Gifford
1996 – BeFS by Giampaolo and Meurillon, part of the BeOS
BeOS never had commercial success...
1998 – Indexing Service on Windows NT, not needed at the timeRemnant of the Object File System from the unmaterialized Cairo project
Typically• no ranked retrieval• No users’ input (tags)• No user relationships
![Page 10: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/10.jpg)
Desktop Searches2004 – Windows Desktop Search, widely popular
2005... – Mac OS X's Spotlight, Google Desktop, Beagle, Strigi, Tracker...
Typically• no ranked retrieval ?• No user relationships• no exploits from relations for searching
![Page 11: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/11.jpg)
ProblemsPower tools for power users... But for average users...
Boolean operators???SQL like queries???
![Page 12: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/12.jpg)
Previous Work on Ranked Retrieval
1968 – SMART system by Salton, introduced weights in retrieval, instead of classical Boolean retrieval
1975 – Vectors and cosine similarity by Salton
1988 – Other functions for similarity tested and evaluated by Salton and Buckley
2003 – Fagin proposes and compares several efficient algorithms for top-k retrieval
![Page 13: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/13.jpg)
Design
![Page 14: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/14.jpg)
Design – SNFS
Tags are extracted from object, stemmed and frequency is counted
Weights for each tag and document are calculated
Each object is associated with a unique id in a Tree
A tf-idf weighting scheme was chosen
![Page 15: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/15.jpg)
Design – SNFS
Term Weight and Object ID are stored in an inverted index
Each posting list of the index is a B+Tree stored in secondary memory
The position of the root of the B+Tree in the index is stored in a Red Black Tree
![Page 16: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/16.jpg)
Design – Search and retrieval
The query is split in terms and stemmed
The score of each document is calculated using a threshold algorithm and a tf-idf function
![Page 17: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/17.jpg)
Threshold AlgorithmsInput: Posting lists sorted on weight (decreasing)
t1
t3
t2
depth 1
d1
d3
d2
NRA (No Random Access) Algorithm
d4
d5
d2
2
Doc ID ScoreDoc IDd1 s1d2 s2d3 s3d4
d5 s5s4
+s6
d4
d3
d2
3
+s7+s8+s9
Threshold s1+s2+s3
s4+s5+s6s7+s8+s9
When no score bellow the top-k objects can be improved to exceed the threshold the algorithm halts
![Page 18: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/18.jpg)
Threshold AlgorithmsInput: Posting lists sorted on weight (decreasing)
TA (Threshold Algorithm with random accesses)
t1
t3
t2
1
d1
d3
d2
d4
d5
d2
2d4
d3
d2
3
Threshold s1+s2+s3
s4+s5+s6s7+s8+s9
Doc ID ScoreDoc IDd1 s1d2 s2d3 s3d4
d5 s5s4
+s6 +s7+s8+s9
depth
d5
+s10
When score of the last object is bellow threshold the algorithm halts
![Page 19: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/19.jpg)
Qualitative Comparison
NRA TADisk Accesses
State Keepingand computation
System Calls
We expect TA to perform many more slow disk accessesCan NRA's large state keeping keeping and computation need overcome TA's disk accesses?
We implement both, on hard disk and on RAM-disk to find out...
![Page 20: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/20.jpg)
Implementation with FUSE
![Page 21: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/21.jpg)
Testing
- 4 real world test sets - files containing tags from online objects - index is normally on secondary memory - ram-disk used to evaluate the effect of disk
accesses
![Page 22: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/22.jpg)
Results demanded vs TimeDisk based index
NRA
TA
![Page 23: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/23.jpg)
Results demanded vs TimeRAM based index
NRA
TA
![Page 24: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/24.jpg)
Query Terms vs TimeDisk based index
NRA
TA
![Page 25: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/25.jpg)
Query Terms vs TimeRAM based index
NRA
TA
![Page 26: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/26.jpg)
Beagle vs NRA
Terms vs time
Results vs time
![Page 27: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/27.jpg)
Conclusions
SNFS:- Indexing, storage, and ranked retrieval of
entities in a SN. - Study of efficiency of algorithms and
implementations, using real-world data, and various implementations.
- Competitive performance, (eg against Beagle). - Many ways of further expansion
![Page 28: SNFS: The design and implementation of a Social Network File System](https://reader034.vdocuments.us/reader034/viewer/2022051219/56815fa9550346895dcea237/html5/thumbnails/28.jpg)
Future Work - Expansion for distributed systems and clouds
- Distributed file systems (HDFS) - Distributed data structures
- Tagging, Indexing, and searching for entity-collections – straightforward, as our ‘object’ implementation/abstraction captures this.
- Establishing entities consisting of relationships between entities, using advanced-tagging, and searching for these…