social curation of large multimedia collections on a microsoft azure cloud dazhi chong, samuel...
TRANSCRIPT
DH 2012 Hamburg 19 July 2012 1
Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud
Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris WuKurt Maly, Mohammad [email protected]
Old Dominion UniversityDepartment of Computer Science
DH 2012 Hamburg 19 July 2012 2
Outline
Faceted Classification System and scalability issues
Implementation and deployment on a cloud Evaluation and user studies Conclusions
DH 2012 Hamburg 19 July 2012 3
Faceted Classification Systemand scalability issues Web based application Allows users collaboratively organize
multimedia collections into faceted classification
Social application - must handle Many users Various network traffic levels
Traditional on-premises deployment can’t handle Increasing number of users Numerous evolving classification schemas Large document collections
DH 2012 Hamburg 19 July 2012 4
Faceted Classification Systemand scalability issues
DH 2012 Hamburg 19 July 2012 5
Faceted Classification Systemand scalability issues
New features require even more resources Personal classification schema History feature – evolution of
classification over time Decision – move to a cloud platform
DH 2012 Hamburg 19 July 2012 6
The click-and-drag classification screen
DH 2012 Hamburg 19 July 2012 7
Global and personal (or local) schemas
DH 2012 Hamburg 19 July 2012 8
Faceted Classification Systemand scalability issues
DH 2012 Hamburg 19 July 2012 9
Microsoft Windows Azure vs. Amazon Elastic Compute
Microsoft Windows Azure Cloud Platform as a Service (PaaS) cloud
Hides management and operational side from users
Focus on development and solving business problems
Amazon Elastic Compute Cloud Infrastructure as a Service (IaaS) cloud Allows to deploy new technologies and
adopt new capabilities
DH 2012 Hamburg 19 July 2012 10
Microsoft Windows Azure vs. Amazon Elastic Compute
Both offer reliability and scalability Windows Azure more suitable for
applications with variable load, short or unpredicted lifetime
Azure platform was chosen because of the most managed environment
Choice of either platform – best fit for a company, developers and users
DH 2012 Hamburg 19 July 2012 11
Implementation and deployment on Azure cloud platform
First step – conversion of Joomla 1.6.3 to work with Azure SQL
Second step – converting Faceted Classification System packages to Azure SQL (from MSQL)
Third step – full configuration of the system Last step – configuration of the whole
project and deployment
DH 2012 Hamburg 19 July 2012 12
Implementation and deployment on Azure cloud platform
DH 2012 Hamburg 19 July 2012 13
Implementation and deployment on Azure cloud platform
DH 2012 Hamburg 19 July 2012 14
Design of the cloud-based web application
Final design of current deployment Web role can run by default 20
instances (more if needed) Azure manages load-balancing
(round-robin algorithm, performance and failover in beta) and seamlessly redirects users
All data stored now on Azure SQL
DH 2012 Hamburg 19 July 2012 15
Design of the cloud-based web application
DH 2012 Hamburg 19 July 2012 16
Advantages and disadvantages of deployment on the cloud platform
Advantages High availability, reliability and
scalability Disadvantages Azure SQL is a new product
Lacks features of the full MSSQL DB No profiler Import, export are rudimentary
DH 2012 Hamburg 19 July 2012 17
Advantages and disadvantages of deployment on the cloud platform
Biggest drawback – performance of Microsoft SQL Driver for PHP
Measured query statements – no unusual delays
Fetching results with sqlsrv_fetch_array() sqlsrv_fetch_object()
delays in rendering web pages up to 20 seconds
Deployment of web application should consider all benefits and drawbacks
DH 2012 Hamburg 19 July 2012 18
Evaluation
User studies with classes on information technologies (Spring and Fall 2011) Students had to develop personal facet
schemas Personal schemas were merged into
global schema
Initial Page with only few facets
Page without & with user facets
Item detail screen without & with faces and tags
22
Merging of Personal Facets
Global Personal
- Good facet/category definition - Useful for most users - Optimized - Wide coverage
- Personal use - May contain non-facet schemas - Personal wording for facet/category/tag - Narrow coverage
Approach:
Evaluating all the personal schemas, find most widely used facets/category/items,
use similarity of concepts, enrich/reconstruct the global schema.
Sample algorithm component
23
Popularity Description
New facet (1) It does not existed in the global schema;(2) is used in more than half of the personal schema
New category (1)It or a ‘similar’ category does not exist in the global old facet;
(2)the personal facet containing the global new category is similar to the global old facet
(3)more than half of the users who have the (‘similar’) global facet have the new category under it.
“Similar”: when two entities are either Wordnet similar or structure similar
Example-1:
24
Event
- Group action
- Competition
- Wreck
Location
- Alabama
- Virginia
Source
- Newspaper
- Internet
Space Quality Time
- VA - Good - 1998
- New York - Bad - 2006
- Alabama Event - 2010
Position - Activity Tom
- NY - Crash - OK
- Virginia - Happening - Not Ok
Jason Year
- Favorite - before 2000
- Dislike - after 2000
Global schema (old):
Personal schema:
25
Example-2:
Event Source
- Group action - Newspaper
- Competition - Internet
- Wreck Year
Location - Before 2000
- Alabama - After 2000
- Virginia
- New York
Similarity: S(year, time) =0.5528, S(crash, wreck) =1,
S(New York, NY)=1, S(Virginia, VA)=1
New global schema
DH 2012 Hamburg 19 July 2012 26
Conclusions
A cloud can solve the scalability issue of: compute intensive features such as schema
merging and history (schema evolution) many simultaneous users
Porting a complex application to the cloud is a daunting task – not for the uninitiated