alan smith active handling big data in windows azure storage

Download Alan Smith Active  Handling Big Data in Windows Azure Storage

If you can't read please download the document

Upload: amberlynn-ward

Post on 18-Jan-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Alan Smith ActiveHandling Big Data in Windows Azure Storage On-Premise Replication On-Premise TimeData 30 Days1.6 TB 10 Days4.8 TB 2 Days24.4 TB MSDN Universal - $150 Implementation Challenges Number of Articles4,356,508 Number of Indexed Words27,765,188 Total number of Index Entries1,003,489,254 Total Text Content File Size41.4 GB Text Search Implementation Windows Azure Storage Windows Azure Websites Table Storage Text Index Blob Storage Pages Azure Wiki Website Text Index Table Design PartitionKeyWord RowKey(10,000 word count on page)_PageId PageIdNumeric page ID (Integer) PageTitleTitle of Page (String) Query on PartitionKey (word) Ordered by RowKey (word count on page) Text Index Table Example PartitionKeyRowKeyPageIdPageTitle azure999604_ Capetian Armorial azure999635_ Morphological classification of Czech verbs azure999790_ Armorial of the Communes of Seine-Maritime azure999901_ Azure (color) azure999913_ Windows Azure azure999913_ Ministry of Defence (Spain) azure999920_ Coats of arms of the Holy Roman Empire azure999926_ Armorial of the Communes of Eure azure999930_ Lancia Aurelia azure999935_ Ordinary (heraldry) azure999935_ Characters of The Order of the Stick Uploading Page Data Upload Page Content to Blob Storage 27 XML Content Files (41.4 GB - 4,356,508 Pages) Windows Azure Storage Blob Storage (4,356,508 Blobs) Creating Text Index Data Parse Page Text 27 XML Content Files (41.4 GB - 4,356,508 Pages) Page IDs and Titles (124 MB) Index Entries (19,277 Files GB) Index Data Files typical# ,1| ,1| ,1| ,1| ,1| , 1| ,1 history# ,1| ,1| ,1| ,1| ,3| , 1| ,2 renowned# ,1| ,2| ,1| ,1| ,1| ,1| ,1 line# ,1| ,2| ,2| ,2| ,1| ,1| ,1 varies# ,1| ,1| ,1| ,1| ,1| ,1 | ,1 moore# ,2| ,2| ,1| ,8| ,1| , 2| ,1 journal# ,2| ,1| ,2| ,1| ,3| , 6| ,2 elderly# ,2| ,1| ,1| ,1| ,1| , 1| ,1 bearing# ,1| ,1| ,1| ,1| ,1| ,1| ,1 Contains 1,000 lines Each line contains 100 entries for a word (1 transaction) Insert Index Entries Windows Azure Storage Blob StorageQueue Table Storage Windows Azure Services Worker Roles Insert Index Entries Windows Azure On-Premise Windows Azure Storage TablesBlobsQueues Windows Azure On-Premise Windows Azure Storage TablesBlobsQueues Windows Azure Virtual Machines VM ServicePointManager.DefaultConnectionLimit = 100; ServicePointManager.UseNagleAlgorithm = false; ServicePointManager.Expect100Continue = false; Block Blob Operations Single HTTP request for blob Sequential HTTP requests for blocks Parallel HTTP requests for blocks Blob Upload Block Upload Block Commit Tuning Block Blob Operations Single HTTP request for blob Sequential HTTP requests for blocks Parallel HTTP requests for blocks SingleBlobUploadThresholdInBytes ParallelOperationThreadCount StreamWriteSizeInBytes Blob Upload Block Upload Block Commit Tuning Blob Operations PropertyDefaultRangeDescription SingleBlobUploadThresholdInBytes32 MB1-64 MBMaximum size of a blob in bytes that may be uploaded as a single blob. ParallelOperationThreadCount11-64Number of blocks that may be simultaneously uploaded PropertyDefaultRangeDescription StreamWriteSizeInBytes (Block)4 MB1-4 MBBlock size for writing to a block blob. StreamWriteSizeInBytes (Page)512 bytes 4 MBNumber of bytes to buffer when writing to a page blob stream. StreamMinimumReadSizeInBytes1-4 MBMinimum number of bytes to buffer when reading from a blob stream. CloudBlobClient CloudBlockBlob Parallel and Asynchronous Uploads Parallel Blobs Blob Container Files Blob Parallel Blocks Blob Container Files Blob Parallel Blobs & Blocks Blob Container Files Blob Storage Monitoring Tables $MetricsCapacityBlob $MetricsTransactionsBlob $MetricsTransactionsTable $MetricsTransactionsQueue Handling Outages 29 th February 2012 Major due to certificate error MVP Summit February 28 th March 2 nd 22 nd February 2013 Storage outage due to certificate error MVP Summit 2013 February 18 th 22 nd MVP Summit November 2013 November 18 th 21 st Correlation does not mean causation! Consider processing In the Cloud Modify ServicePointManager Settings Use Parallel and Asynchronous Actions Tune CloudBlobClient and CloudBlockBlob properties Fiddler is Your Friend (Especially the Timeline) Use the Source (Windows Azure SDK on GitHub) Understand Storage Emulator Limitations Understand transient faults Understand Pricing Implications Leverage Storage Analytics Thanks!