long tails and archive systems elliot jaffe fdis 2005
TRANSCRIPT
![Page 1: Long tails and Archive systems Elliot Jaffe FDIS 2005](https://reader036.vdocuments.us/reader036/viewer/2022082818/56649f125503460f94c24f57/html5/thumbnails/1.jpg)
Long tails and Archive systems
Elliot Jaffe
FDIS 2005
![Page 2: Long tails and Archive systems Elliot Jaffe FDIS 2005](https://reader036.vdocuments.us/reader036/viewer/2022082818/56649f125503460f94c24f57/html5/thumbnails/2.jpg)
Archive Metrics
• What– Distribution of file sizes– Distribution of occupied storage– How are files accessed
• Why– System architecture– Scaling for access
![Page 3: Long tails and Archive systems Elliot Jaffe FDIS 2005](https://reader036.vdocuments.us/reader036/viewer/2022082818/56649f125503460f94c24f57/html5/thumbnails/3.jpg)
File size studies
UFS93 (1993)
• 12 million files
• UNIX only
• Avg. file size is 2k
• 90% of storage in
11% of files
HUJI (2005)
• 4 million files
• UNIX + Windows
• Avg. file size is 8k
• 90% of storage in
5.5% of files
![Page 4: Long tails and Archive systems Elliot Jaffe FDIS 2005](https://reader036.vdocuments.us/reader036/viewer/2022082818/56649f125503460f94c24f57/html5/thumbnails/4.jpg)
What’s Changed
Then
JAWS, NOW
Online was expensive
Offline tape storage
Now
Central File Servers
Digital Libraries
Online is cheap
No offline storage
XML
Multimedia
![Page 5: Long tails and Archive systems Elliot Jaffe FDIS 2005](https://reader036.vdocuments.us/reader036/viewer/2022082818/56649f125503460f94c24f57/html5/thumbnails/5.jpg)
Empirical Data
![Page 6: Long tails and Archive systems Elliot Jaffe FDIS 2005](https://reader036.vdocuments.us/reader036/viewer/2022082818/56649f125503460f94c24f57/html5/thumbnails/6.jpg)
Questions
• What is the future of these distributions?
• Are the changes extensions of the tails with power laws, so that 10/90 and 20/80 rules no longer work and are the wrong way to think about them?
• Are the changes based on external factors that are unpredictable?
![Page 7: Long tails and Archive systems Elliot Jaffe FDIS 2005](https://reader036.vdocuments.us/reader036/viewer/2022082818/56649f125503460f94c24f57/html5/thumbnails/7.jpg)
The Long Tail
• Chris Anderson (2004)– http://www.wired.com/wired/archive/12.10/tail.html
• The long tail of a distribution has tremendous mass and creates new market opportunities
• Amazon, Netflix, Wikipedia
![Page 8: Long tails and Archive systems Elliot Jaffe FDIS 2005](https://reader036.vdocuments.us/reader036/viewer/2022082818/56649f125503460f94c24f57/html5/thumbnails/8.jpg)
Today’s landscape
NOW
File Servers
Sarbanes Oxley
Digital Libraries
Storage Capacity
Access Frequency
![Page 9: Long tails and Archive systems Elliot Jaffe FDIS 2005](https://reader036.vdocuments.us/reader036/viewer/2022082818/56649f125503460f94c24f57/html5/thumbnails/9.jpg)
Next Steps
• Collecting data from large storage systems– File Sizes, Created, Last Modified, Last
Access, Frequency of Reads
• Goal: New architectures for Digital libraries– Focus on Operations– Store large and small files differently– Store very-low access files in slow access