using azure for data crunching - lup yuen lee (ncs)

18
Lee Lup Yuen Principal Consultant Chief Architect Office 13 Apr 2010 Case Study: Using Azure for data crunching

Upload: spiffy

Post on 13-Dec-2014

1.242 views

Category:

Technology


1 download

DESCRIPTION

Presentation by Lup Yuen Lee (NCS) at "MSDN Presents: Windows Azure Platform" Event (Apr 13, 2010) .

TRANSCRIPT

Page 1: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

Lee Lup YuenPrincipal ConsultantChief Architect Office13 Apr 2010

Case Study: Using Azure for data crunching

Page 2: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

• I have a huge database of (say) book titles

• I want to pre-generate a list of search suggestions for my search UI

• Needs plenty of CPU and RAM

• I want it fast and cheap

• Why use Azure?– On demand computing + storage– Ready-to-use .NET and SQL infra

The problem

04/10/23

Page 3: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

The current database

• Database consists of titles, keywords and other metadata

• SQL Server 2008

• 2.9 GB of data

• 4.3 million records

04/10/23

Page 4: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

Migrating the database to SQL Azure

• What’s different about SQL Azure?– SQL Server 2008 R2 can be used for remote mgmt

– Some GUI mgmt functions not supported, need to use T-SQL commands

– Need to open the Azure Firewall to allow remote SQL access

– All tables must have a clustered index

– CLR and Table Partitioning not supported– More details: http://msdn.microsoft.com/en-us/library/ff394102.aspx

• Copy data using SSIS Import/Export Data Wizard– Choose the “.NET Framework Data Provider for SQL Server”

04/10/23

Page 5: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

SQL Azure

04/10/23

2.9 GB database,4.3 million records

uploaded to Azure in about 1 hour

Page 6: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

SQL Azure Firewall

04/10/23

Add your firewall rule

Page 7: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

Using SQL Server Mgmt Studio with Azure

04/10/23

Azure SQL Server

Local SQL Server

Page 8: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

The current app

• Based on .NET 3.5; 700 lines of code

• Scans titles and keywords for substrings and counts the number of occurrences of each substring

• Brute force approach– 100% CPU utilisation– Can use more than 10 GB of runtime memory

04/10/23

Page 9: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

Migrating the app to Windows Azure

• What’s different about .NET on Azure?– SQL works OK, need to use Azure connection string like

Server=tcp:xxx.database.windows.net;Database=xxx;User ID=xxx@xxx;Password=xxx;Trusted_Connection=False;Encrypt=True;

– Filesystem Support: Requires virtual drive configuration– Beware of unmanaged code and access to local resources

• Just follow the steps at http://msdn.microsoft.com/en-us/library/dd179367.aspx

– Download and install the Azure SDK– Create an Azure solution in Visual Studio– Copy your code into the Worker Role, Web Role or WCF

Role– Configure the Virtual Machine for ExtraLarge if necessary

04/10/23

Page 10: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

Visual Studio with Azure SDK

04/10/23

ExtraLarge VM: 8 CPU cores

15 GB memory2,000 GB disk space

Azure Project

WCF App

Web App

Worker App

Page 11: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

Deploying and running the Azure app

• Test the app locally using the Azure SDK

• Use Visual Studio to build and publish the app

• Go to the Azure website, create a hosted service

• Upload the *.cskpg and *.cscfg files

• Click “Run”

04/10/23

Page 12: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

Windows Azure Hosted Service

04/10/23

Page 13: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

Deploying a hosted service

04/10/23

Page 14: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

Azure Performance

• Comparing Azure with my low-cost server– Core i7 3 GHz, 12 GB RAM, Win2008 R2 64-bit,

SQL2008 R2 Nov CTP 64-bit– S$3,000 from Sim Lim computer shop

04/10/23

Processing time for My Server Azure

500K records 0.5 hours 0.6 hours

1M records 1.5 hours 2 hours

2M records SQL Timeout 6.5 hours

4M records (To be provided) (To be provided)

100% RAM utilisation

Page 15: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

Azure Pricing

Source: http://www.microsoft.com/windowsazure/faq/#pricing04/10/23

ComputeUS$0.12 / hour for the SMALL instance US$0.24 / hour for the MEDIUM instance US$0.48 / hour for the LARGE instanceUS$0.96 / hour for the EXTRA LARGE instance

StorageUS$0.15 / GB stored/monthUS$0.01 / 10K storage transactions

SQL AzureWeb Edition – Up to 1 GB relational database = US$9.99 Business Edition – Up to 10 GB relational database = US$99.99

Data Transfers US$0.10 in / US$0.15 out / GB for North America and EuropeUS$0.30 in / US$0.45 out / GB for Asia PacificInbound data transfers during off-peak times through June 30, 2010 are at no charge. Prices revert to our normal inbound data transfer rates after June 30, 2010

Page 16: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

My Azure Bill

Total Bill: US$ 122.88

04/10/23

Page 17: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

Sample Detailed Bill

04/10/23

Page 18: Using Azure for Data Crunching - Lup Yuen Lee (NCS)

Lessons learnt

• Test locally before deploying to cloud– Deployment may take a few minutes

• Migrating apps and data to Azure can be fast and easy– Mine took only 2 days

• Azure is cost-effective for my app– Costs less than a low-cost server

04/10/23