the foundations of cloud data storage

41

Upload: jan-erik-finlander

Post on 14-Feb-2017

10 views

Category:

Documents


0 download

TRANSCRIPT

The Foundations of Cloud Data Storage The growing mountain of available data is matched by an equally high desire to access it. The rise of cloud networks makes managing this overwhelming amount of information not only possible, but highly beneficial.

https://cloud.google.com/pricing

Here you’ll find an overview some specific solutions, including market leaders like Google Cloud Platform and Amazon Web Services, Microsoft Azure, as well as competitive services from companies like RackSpace and HP Helion. Requirements From a tools perspective, you'll need a code editor and several browsers for testing. You can use whichever code editor you prefer. To explore a set of online services, like cloud data storage, you'll need an internet connection. Most of the demonstrations and testing could be done with recent versions of standard based browsers like Google Chrome. Although many of the cloud data storage platforms offer trial periods, in most cases, you'll need to enable billing, which requires either a credit card or a bank account. That's it for tools and real-world requirements.

Cloud Storage Fundamentals 1

https://cloud.google.com/storage/docs/apis

From a knowledge perspective, you should have a general understanding of how server code modules like APIs work. This is a full exploration of cloud data storage platforms and implementations, and you'll get the most benefit if you keep your mind open for other ways that you can apply the same lessons to your own needs. The absolute best thing you can bring to this training is your imagination.

Cloud Storage Fundamentals 2

Disclaimer The information contained in this manual is for general information purposes only, and provided as-it-is and while I try to keep the information up-to-date and correct, I make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk.

In no event, will I be liable for any loss or damage including without limitation, indirect or consequential loss or damage, or any loss or damage whatsoever arising from loss of data or profits arising out of, or about, the use of this manual.

Through this manual you’ll find link to websites which are not under my control. I have no control over the nature, content and availability of those sites. The inclusion of any links, does not necessarily imply a recommendation or endorse the views expressed within them.

Every effort is made to keep this manual’s content up-to-date. However, I take no responsibility for, and will not be liable for, the website where this document is shared, being temporarily unavailable due to technical issues beyond my control.

Jan-Erik Finlander - 2017

Cloud Solutions Architect

Cloud Storage Fundamentals 3

Contents

Requirements 1

Contents 4

1. Introduction to Cloud Data Storage 6

1.1. Understanding cloud data storage 6

Cloud Data Storage Benefits 6

Cloud Data Storage Risks 7

Cloud Data Storage Services 7

1.2. Calculating Costs 8

Cloud Data Storage Costs 8

Cloud Pricing Calculators 9

Google Cloud Platform’s Pricing Calculator 9

AWS’ Pricing Calculator 10

Microsoft Azure’s Pricing Calculator 11

1.3. Cloud Storage Solutions 11

2. Cloud Storage Options 15

2.1. Working with Object Storage 15

2.2 Managing Database Content 17

Cloud Relational Database Features 18

Cloud Relational Database Access 19

Cloud Non-Relational Databases 19

Non-Relational Database Management 19

Cloud Database Security 20

2.3. Targeting Storage Availability 20

2.4. Assessing API interconnectivity 21

3. Data Storage Issues 23

3.1. Understanding Data Storage Issues 23

Service Level Agreement (SLA) 23

3.2. Establishing and Maintaining Secure Storage 25

Cloud Data Storage Security 25

3.3. Handling Latency 27

Cloud Storage Fundamentals 4

Data Cloud Storage Latency 27

3.4. Managing Scalability and Replication 30

Why would you use replication? 31

4. Data Storage Vendors 32

4.1. Google Cloud Platform 32

4.2. Amazon Web Services (AWS) 34

Amazon Cloud Databases 36

4.3. The Microsoft Cloud 36

Azure Blob Command Tools 37

4.4. HP Helion Cloud 38

HP Cloud Object Storage Access 38

Sources 40

Cloud Storage Fundamentals 5

1. Introduction to Cloud Data Storage 1.1. Understanding cloud data storage Like data itself, cloud data storage is a sprawling and continuously evolving topic. Cloud data storage refers to a repository for digital information on one or more servers, in one or more locations.

Let’s focus on corporate and enterprise level solutions, not personal file hosting, although there is some overlap. Cloud data storage is a concept whose time has come with availability of cloud based network infrastructure extended to the market. Amazon opened the floodgates in 2006 with the introduction of Amazon Web Services S3. Today, there is an ever-growing array of companies, that offer cloud data storage services including: Amazon, Box, Google, HP Helion, Azure, Oracle, RackSpace, and Zetta. Cloud data services have taken off largely because they fit a variety of use cases. They're great for application data regardless of where the user is. Cloud Data Storage Benefits

● Application data ● Big data ● Archiving and backups ● Long-term storage ● Disaster recovery

Cloud Storage Fundamentals 6

Big data, both in terms of file size and quantity of records, routine archives and backups, long term storage of all types of records, and in case of emergency, disaster recovery. There are numerous impactful benefits to going the cloud data storage route, among them, are access. Your data is available from pretty much anywhere on the planet that there is an internet connection. Scalability, cloud storage is, for intents and purposes, infinite, and can grow with your needs, where security, not only can your data access be restricted to authorised users, but since cloud storage offers both zonal and geographic redundancy the possibility of total data loss is severely limited. And one of the biggest gorillas in the room is cost. Hosting your data on the cloud means a significant reduction in self-maintained servers. Which not only cuts the actual physical footprint but also the man hours required to maintain those servers. Cloud Data Storage Risks

● Security ● Privacy

There are risks to be considered. Perhaps paramount in the age of the cyber hacker is security. Cloud storage providers must implement strong and continually updated strategies to keep your data from being compromised. Privacy goes together with security. Since we're talking about data stored in one or more off-site facilities you must ensure its encrypted and accessible only by authorised users. You should be aware of the privacy laws governing the data centre locations. Network issues should also be considered. While downtime leading to data inaccessibility is perhaps the ultimate worry, backup and restoration speeds are also affected by available bandwidth and demand. Cloud Data Storage Services

● Online management ● API Access ● Optimisation

The various cloud data storage hosts offer a wide spectrum of services, but almost all provide online management of storage including import, export, and backup operations.

Cloud Storage Fundamentals 7

API access for automated data storage control. And methods for optimising operations whether it's establishing access control lists for authorised users, or setting up transfer of multiple data objects in parallel for greater efficiency. So, that's a cloud's eye view of cloud data storage. Next, we'll take a closer look at one of the key factors, cost.

1.2. Calculating Costs Calculating costs for cloud data storage can be a daunting task at best. Whether you are trying to make a basic decision as to its cost-effectiveness versus in-house storage or forecasting expenses for multiyear budgets, there are a good number of factors to consider.

Let’s look at the most pertinent of those, as well as some useful tools. Although prices vary, as you would expect, there are a few guiding principles that seem to hold across the board.

Cloud Data Storage Costs ● Pay per resource used vs. flat fee ● Combination of charges

○ Storage ○ API operations ○ Network transfers

● Higher volume, cheaper rate

First, the clear majority of cloud data storage companies set their pricing on a pay per resources used basis versus a flat monthly or annual fee. Pay per use philosophy is applied to most, if not all aspects of the service.

Second, storage pricing is often a combination of charges. You can expect your bill to include a charge for storage, API operations, such as listing and downloading, and for transfers in the network. Now although this might give you pause, the rates for each of these areas are generally very inexpensive.

Finally, the higher the volume, the cheaper the rate. While this approach doesn't apply across the board, many companies offer lower prices for greater uses of both storage and network transfer.

The first cost comparison you might want to run pits keeping your data on your own servers versus moving it to the cloud. In the privately-owned side, you have the very real upfront capital expenditures such as hardware purchases, installation and configuration.

Cloud Storage Fundamentals 8

Own Storage Cloud Storage

● Upfront CapExp ● Ongoing OpEx

● Ongoing OpEx ● Basically, a rental

The ongoing operating expenses of maintenance and replacement. With the cloud, there are no such capital expenses. It is all operating costs. And while it's true that those operating costs are perpetual, cloud data storage is, at its heart, a rental after all.

Many IT managers have opted to go the hybrid route, where they use both their own existing servers with those from a cloud storage host. The cost calculations for such an arrangement take on another level of complexity, but it might be the right fit for your organisation.

Cloud Pricing Calculators To give you a concrete idea of how pricing for cloud data storage works, let's look at some of the handy tools made available by vendors.

Google Cloud Platform’s Pricing Calculator

https://cloud.google.com/products/calculator

Not all data is equal. Data that does not need to be accessed as frequently or as quickly can be stored at a lower rate. Backup data, which you don't need to be as responsive as application data, can be kept for less in Durable Reduced Availability

Cloud Storage Fundamentals 9

storage.

FREE LIMIT PER DAY

PRICE ABOVE FREE LIMIT (PER UNIT) PRICE UNIT

Stored data 1 GB storage $0.18 GB/Month

Entity Reads 50,000 $0.06 per 100,000 entities

Entity Writes 20,000 $0.18 per 100,000 entities

Entity Deletes 20,000 $0.02 per 100,000 entities

Small Operations Unlimited Free -

For data, which you access even less frequently, like disaster recovery data, consider using Cloud Storage Nearline to get the lowest rate.

AWS’ Pricing Calculator

https://calculator.s3.amazonaws.com/index.html

Make sure that you click on the Amazon S3 link on the left. That's the Simple Storage Service.

This is their most accessible tier. You might also want to look at the pricing for Amazon Glacier, which has their lower-cost lower availability data.

Cloud Storage Fundamentals 10

Microsoft Azure’s Pricing Calculator

Obviously, figuring the cost of going with cloud data storage is only part of making the business case for the move. But now you should have a better understanding of the various facets you'll need to examine.

1.3. Cloud Storage Solutions If you've considered the Cloud data storage market at all you know that it's a wildly growing competitive one with many players across the spectrum. In this lesson, we'll take an overview to five of the top contenders.

Amazon Web Services, also known as AWS. AWS is a full cloud platform with services in computing, databases, analytics, applications, and deployment as well as storage.

The primary object storage system is the Amazon Simple Storage Service referred to with another acronym S3. S3 is a very straightforward but extremely robust object storage.

There is no limit to quantity and individual objects can be as large as five terabytes. S3 features a high-degree of replication across multiple regional data centres. Lower cost block storage is available through Amazon's Reduced Redundancy Storage option or it's Amazon Glacier service.

Cloud Storage Fundamentals 11

https://aws.amazon.com/products/storage

If you’re working with Amazon's compute service, EC2, you can also use their Elastic Block Store or EBS feature. EBS is like a more traditional file system while remaining highly scale-able. Amazon also offers several database options for structured data storage, Amazon RDS for relational SQL databases and Amazon DynamoDB for non-relational NoSQL.

The Google Cloud Platform provides an ever-growing service that leverage on their global infrastructure. Cloud storage is Google's primary object storage service and, offers limitless storage automatically replicating across many data centres located around the world. Cloud storage objects can be stored in different types of buckets with varying degrees of accessibility and price points.

Google's relational database service Cloud SQL is MySQL-based and allows you to spin up database instances as needed. For data appropriate to a non-relational platform, you can turn to the Schemaless Cloud DataStore. Google recently brought online a new entry in the non-relational space called BigTable, and this is targeted to massive data sets.

Microsoft Azure in its storage realm, objects like documents and media files are handled by Azure Blob Storage via a REST interface and client libraries for .NET, C++, Java, Node.js, and Android among others.

Cloud Storage Fundamentals 12

https://azure.microsoft.com/en-us/services/storage/tables

The Azure Table Storage service manages non-relational data in a NoSQL fashion complete with auto-load balancing. A separate service SQL Database takes care of relational data.

https://www.rackspace.com/cloud/files

RackSpace puts its Cloud data storage service under the infrastructure umbrella with three targeted offerings. Cloud Files is for data objects and boast triple replication with a simplified pricing structure.

The RackSpace Cloud Backup service automatically employs block-level compression and 256-bit key encryption to keep your data compact and secure.

Cloud Storage Fundamentals 13

HP Cloud Object Storage and HP Cloud Block Storage. Their Cloud object storage services support the open-stack standard with both Java and Ruby APIs.

HP Cloud Block Storage pairs with HP Cloud compute instances but the storage persist until they are explicitly deleted.

Cloud Storage Fundamentals 14

2. Cloud Storage Options 2.1. Working with Object Storage Let’s see how objects are typically handled by the various services, benefits you should expect and methodology to look for. What do we mean by an Object? Objects are discrete digital entities, which could mean documents, files, both uncompressed and compressed, images, video, audio, all media.

Because of the nature of cloud storage, neither quantity nor object file size is a problem - Blobs (Binary Large Objects), are easily accommodated. Many services can handle a single put of any file up to 5 gigabytes in size. Larger files should be split into multiple parts, also known as segmenting and uploaded as part of an overall Object, typically identified with a unique ID. Structurally object storage is organised on 2 levels, in the initial layer is the Container, also called an Asset Group or more commonly a Bucket. You can have as many Containers as you need, each container has a unique ID so that they can be accessed globally. Containers are project specific and on all the services and cannot be nested. However, you can create folders within the Containers to create a hierarchy. For services with multiple data centres around the world, you can specify the region to host the Container.

Cloud Storage Fundamentals 15

Containers are great for reducing latency to your targeted markets. Each individual Object is stored within a specific Container, Objects cannot be shared across Buckets, although they can be duplicated. Frequently, you'll find the ability to create identifying metadata for your Objects via main value pairs. When the Object is uploaded to the Container, once its integrity is validated, it's available. Similarly, once you delete it, it's no longer accessible there are no undos, so backups are essential. Many services offer server-side encryption for security, although you're also free to use client-side encryption prior to transferring the Object. As mentioned in an earlier movie, many Cloud platforms provide several varied storage classes with lower costs for data that you don't access as frequently or with as much redundancy. Some services, like Google Cloud Storage, apply them at the Container, or Bucket level.

Cloud Storage Fundamentals 16

While others, such as Amazon S3 allow you to specify the storage class for individual Objects. Once you've set up your Containers and began to populate them with Objects you should ensure that only the people you want to access them can. Object Storage is typically, by default, private and initially only accessible by the owner or primary administrator. Permissions can be broadened however all the way up to publicly available. Most services have mechanisms in place for establishing authenticated users. Often via ACLs or Access Control Lists. All services provide some API libraries for Object management and many platforms offer a full spectrum of languages from which to choose: Anything from Java, .net, PHP to Python. And with them you can get a full list of your Containers, find out what is in each one and then store, retrieve, copy and delete those Objects. More advanced operations such as targeting specific versions of the Object are available on specific platforms. Mastering Containers and Objects is essential to much of productive Cloud Data Storage, Data Record Storage and databases and other systems is another major aspect of the Cloud Data Storage world. 2.2 Managing Database Content Cloud data storage can handle structured data as well as unstructured blobs. There are two major strains of databases supported, Relational Databases and Non-Relational Databases. Relational Databases are typically SQL databases and the Non-Relational use NoSQL which is short for not only SQL.

Some cloud data storage platforms like Google and HP focus on MySQL while others support a range of MySQL variations.

Cloud Storage Fundamentals 17

RackSpace supports MySQL Percona Server and MariaDB. Some services such as AWS work with other Relational Databases including Oracle, SQL Server or PostgreSQL as well as MySQL.

A few cloud data storage services have opted for the Proprietary route like Microsoft Azure with their SQL database offering.

AWS Google Microsoft

RDBMS RDS - all major MySQL SQL Azure

NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs

NoSQL Key-Value DynamoDB Cloud Datastore Azure Tables

Streaming ML or Apache Mahout

Custom EC2 Prospective Search & Prediction API

StreamInsight

NoSQL Document or Graph

MongoDB on EC2 Freebase MongoDB on Azure

NoSQL - Column Hadoop Hbase

Elastic MapReduce + S3 & EC2

Cloud Data Proc HDInsight

Dremel / Warehousing Redshift BigQuery SQL Data Warehouse

Cloud Relational Database Features

● Replication across data centres ○ Increase data durability ○ Decrease latency

● Scale up / down data instances ● Backups created automatically

Many cloud data storage services take SQL Servers to the next level by replicating the databases across data centres, increasing data durability and decreasing latency. The cloud database servers scale very efficiently, spinning up new database instances or down as needed. Backups to multiple locations are often created automatically, allowing point-in-time recovery. Access to cloud-based databases is broad overall but specific APIs for database management are limited on a service-by-service basis. Cloud Relational Database Access

● HTTP requests supported across the board ● Specific APIs vary by services provider

Cloud Storage Fundamentals 18

● New APIs routinely introduced

All hosts support standard HTTP requests for accessing data but you'll have to check each service to verify that an API for the language of your choice is available. And keep in mind that it is by no means a static situation. Cloud Non-Relational Databases

● Proprietary NoSQL frameworks for each vendor ○ Amazon DynamoDB ○ Rackspace ObjectRocket ○ Google Cloud Datastore

● Appropriate for massive datasets ○ AWS Redshift ○ Google Cloud Bigtable

Many services add additional APIs on a continuing basis. If your applications lend themselves to non-relational data with relatively straightforward queries, the most responsive database system capable of scaling to massive size is NoSQL. All the services that provide NoSQL alternatives, provide their own framework like Amazon DynamoDB, RackSpace ObjectRocket, or Google Cloud DataStore.

Both key/value and document-based NoSQL systems are available. Amazon DynamoDB works with either while Microsoft DocumentDB is document focused, and Google Cloud DataStore is key/value oriented.

NoSQL's relatively simple structure opens the door to efficient processing of big data. Several cloud data hosts are taking advantage of NoSQL, like AWS Redshift service or the recently introduced Google Cloud BigTable. Non-Relational Database Management

● Platform specific but robust ○ Create, update, and delete tables ○ Create, update, and delete content (items or entities) ○ Create, update, and delete content attributes

Management access to the NoSQL services are platform-specific but tend to be very robust. With most of them, you'll be able to programmatically create, update and delete tables as well as perform similar operations on table contents which may be called items or entities and their attributes as well. Cloud Database Security

● Replication Cloud Storage Fundamentals 19

○ Automatically initiated ○ Geographically separated facilities ○ API controlled ○ Selectable read consistency and write verification

With both SQL and NoSQL solutions, data security is enhanced by automatic replication, often across geographic data centres. Replication can also be implemented via API calls. Numerous services allow you to optimise the degree of read consistency and right verification that your data requires.

Cloud database storage is just as vigorous and vital as its sibling Object Storage. However, there is more diversity in the feature sets found on the various providers. You'll need to research carefully to find the right fit for your organisational database needs.

2.3. Targeting Storage Availability By default, all containers and objects are initially private and only accessible by the project administrator, often referred to as the owner. Only the owner can grant permissions to others to read or interact with a container and its objects.

Grantees can include: people, individuals, identified by ID number or email, groups, like an email group, or domains, often expressed as a subnet range or an IP address. Collectively, these permissions are called Access Control Lists, or ACL. Various services support ACLs written in a variety of languages, but the most common are XML and JSON.

It's quite common for the Cloud Data services to make APIs available in multiple languages, including Java, C++, PHP, Python, Node.js, Ruby, and others. We'll take a closer look at APIs in the next lesson. Once you've established authorised authenticated user access, Cloud Data Storage

Cloud Storage Fundamentals 20

give you full direct management capabilities, just as you would have over in house storage. 2.4. Assessing API interconnectivity

A good API, application programming interface, is truly worth its weight in gold considering the time and effort it saves you in coding, testing, and debugging.

The Cloud data storage vendors and for that matter community fully embrace APIs across the spectrum which can create a bit of a problem, because there are so many options to choose from.

http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingAWSSDK.html

Because most of the Cloud data storage solutions are part of a larger platform, most of the associated APIs are contained within a series of overall SDKs. Each written for a separate language.

The AWS SDKs, are available from this URL for Java, .NET, PHP, Ruby, and the Python interface to AWS which is called Boto. The Google Cloud Platform has a similar SDK for overall functionality, although theirs is not broken out by language and requires Python 2.7.

Cloud Storage Fundamentals 21

https://cloud.google.com/sdk Here are the APIs available for Google Cloud Storage. As I scroll down, you'll see that there's support for .NET, Java, JavaScript, Objective-C, PHP, and Python.

https://cloud.google.com/storage/docs/json_api/

Leveraging the available APIs such as those found in AWS and Google Cloud Storage is a crucial strategy towards efficient storage and IT management.

Cloud Storage Fundamentals 22

3. Data Storage Issues 3.1. Understanding Data Storage Issues The benefits of cloud data storage are undeniable: Global access, no upfront capital expense, and virtually unlimited capacity to mention a few. But cloud data storage is not without its dark side. The number one issue must be security. The flip side of being able to access your data from anywhere is that people from anywhere can potentially access your data.

First, there's the vulnerability of transferring your data to and from the cloud. Industrial strength encryption is typically the answer, but you must have a robust encryption key management system in place to maintain long term accessibility.

Next, you need to be confident in the cloud service provider's own security when your data is at rest to guard against data breaches. This protection needs to be up-to-date and evolving because the threat is certainly ongoing. To this end, you want to make sure that access logs are complete and monitored routinely.

Service Level Agreement (SLA) ● Tool for reducing down time

As with any network, there's always a problem of down time. Reducing network inaccessibility to an absolute minimum is a key requirement and one the service providers work diligently toward addressing. While maintaining the external network is beyond your control, you do have a key tool for dealing with any such problems, the Service Level Agreement or SLA.

A solid SLA details up time expectations and the consequences, typically service credits if things go south. Not only do you want to make sure your data is accessible,

Cloud Storage Fundamentals 23

quite often you also want to ensure that it's delivered as quickly and efficiently as possible, especially if you're working with application data. With global networks, latency can be a real concern.

If you have a worldwide audience, you probably want to take advantage of a data cloud storage host with the worldwide reach of multiple data centres located geographically closest to your own markets.

One of the major selling points for cloud data storage is scalability. When your traffic increases, cloud storage hosts are set to share the load among multiple servers. If your traffic lessens, the number of servers in play shrink as well. This impacts your bottom line, as cloud data storage is a pay for what you use service.

Cloud Storage Fundamentals 24

While the various providers are designed to scale object storage, there are several techniques you can apply to optimise the practice. Like any service, cloud data storage is not without its problems. As always, the first step in addressing them is to identify the issues.

3.2. Establishing and Maintaining Secure Storage Secure organisational data is a topped, ranked, if not number one task for IT departments. Storing your data in a remote, offsite facility requires a robust strategy, and an ongoing participation, by both client and host, a fact that cloud data storage providers are aware of.

Cloud Data Storage Security ● In transit: Transferring to and from storage host ● At rest: stored at remote facility

You can break hosted security concerns into two main areas. First, in transit, when the data is being transferred to or from your system. And second, at rest, when the data is on the remote storage server.

In transit, data can be protected in several ways, none of which are mutually exclusive. SSL and HTTPS protocols should always be used to secure the data's travels. Extremely sensitive data can also be encrypted on the client-side prior to transfer. Naturally, this means that you'll have to have a solid key encryption management system in place.

If you choose not to transfer encrypted data, the cloud data storage host can encrypt it for you, so that's it's secure while at rest. Services like Amazon S3 allow you to establish bucket policies that will stop the transfer unless the data's header contains a request to encrypt the data server-side.

AWS also supports another layer of protection for server-side encryption. Key management service, also called KMS.

Cloud Storage Fundamentals 25

KMS gives you control over server-side encryption keys, preventing those keys from ever being exported, and providing a full audit trail of their use. There is an additional charge for using KMS managed keys however.

Another way to secure your data is by versioning, some variation of which is offered by several cloud data storage services, including S3. Once versioning is enabled, your data is protected from accidental deletion or overwrite. Versioning is typically enabled at the container, or bucket, level. From a security standpoint, it's a good idea to enable logging.

Logging is disabled by default on most services. And once set up, all requests for server access is tracked, and typically includes requester details, container name, object name, request time, request action, response status, and the error code, if any.

Cloud Storage Fundamentals 26

Logs are stored in a designated container on the cloud data storage host, and can be retrieved and examined at any time. Because they are treated like any other storage object, they will incur a charge, and you should set up a policy for archiving or deleting them after a set period.

Although storing your data remotely is undeniably a risk, with heightened awareness, and fully taking advantage of available cloud data storage tools, you can minimize that risk as much as possible. 3.3. Handling Latency

Speed matters. Especially the speed at which your data travels from where it is stored to where it needs to go. Latency is a real cloud storage data factor, and what options you have for optimising it.

Data Cloud Storage Latency ● Location is important ● Store data closest to user base ● Specify container’s region

○ US, Europe and Asia

Latency can be defined as the amount of time it takes one packet of data to get from location to another. In terms of Cloud data storage, we're talking about the length of time from when the request is received by the data hosting server, to when the response is received by the requesting client. Latency is a key defining characteristic for various storage classes. To further optimise latency, the most important is location. Whenever possible, it's best to house your data closest to the folks who want it.

https://cloud.google.com/storage/docs/bucket-locations

Most Cloud data storage vendors allow you to specify the region when creating a container for objects. Typically, the regions available are sizable in scope like the US, Europe, or Asia. And, you should place your storage nearest your market.

Cloud Storage Fundamentals 27

https://aws.amazon.com/about-aws/global-infrastructure

There is a trend, to break up the large regions and allow a finer container placement. Google Cloud Platform Bucket Locations service, can be used with their Durable Reduced Availability storage class.

You can specify that you want your objects to be housed in the eastern US, the western US, or central US, or any combination thereof. Or any other regions that are available.

What else can you do to lessen latency and improve performance? Believe it or not, the actual naming of an object, and / or its container, can have serious impact on response time. Most Cloud data storage services index alphabetically their key name.

It's a common practice to incorporate a time stamp as part of that ID. This has the effect of grouping objects that were transferred at about the same time on the same server partition; therefore, it's recommended to preface your object and container names with a random hash string, which will have the effect of spreading them out on varying partitions.

When it comes to structured data versus unstructured blobs, latency is tied to data consistency. Because database entries can be modified at any point, the read write times are impactful, and the more emphasis placed on shorter spans, thus heightening data consistency, the greater the latency.

Cloud Storage Fundamentals 28

https://azure.microsoft.com/en-us/blog/azure-documentdb-is-now-available-in-central-us

Microsoft Azure DocumentDB has identified this as a key area for their service, and now offers four distinct levels of consistency: Strong, Bounded Staleness, Session, and Eventual.

The Strong level of consistency results in the highest latency, while the Eventual level is the lowest. Understanding how latency works, and the associated options, is a pivotal step in positioning your Cloud data storage properly.

Cloud Storage Fundamentals 29

3.4. Managing Scalability and Replication The raw power of today’s cloud data storage industry is really apparent when you consider two defining characteristics: Scalability and Replication.

Scalability is the ability of a system to efficiently adapt to handle the current workload. The vastness of the networks now available for cloud data storage means that there's virtually no limit to the number of objects or the amount of data that you can store online.

This scalability is, for the most part, effortless for customers of these services, because the infrastructure is already in place and being maintained by the service providers.

On the bulk of cloud data storage hosts, there are an infinite number of containers available, and each container is infinitely large. When you try to store more objects in a container than can be physically contained in a single drive, the data will be written to other systems while still existing within the same virtual bucket.

Cloud Storage Fundamentals 30

Although the image that most frequently comes to mind when you say scalability, is one of the service increasing its processes to meet surging tasks, scaling up, the ability to discard unneeded processes, scaling down, is just as important. Because cloud data storage runs on a pay for what you use model, most storages calculate their storage charge on a monthly average use. Now if your average goes down, the charge goes down.

Replication is the duplication of data in real time over a network. It's a common practice among cloud data storage platforms to automatically replicate your objects when they're added to your containers, and store the redundant objects in multiple devices, usually in the same region. When the object is replicated, everything remains the same. The key name, the metadata, the container, everything. The primary goal of replication is data protection, or durability, making sure that your data objects are available.

Durability is the probability that an object will be the same as when you transferred it after one year. The greater the likelihood that your data will be available, the higher the durability. 100% durability would mean that an object could not be lost. 90% durability means that there's a one in ten chance.

AWS rates their S3 standard storage class at 99.999999999% durability. This means that if you store, say 10,000 objects with them, one might get lost every 10 million years or so. This automatic replication is to other devices within the same region. Now you can also replicate your data to a different region.

Why would you use replication? 1. You can reduce latency by housing your objects as close as possible to your

markets. 2. Regulatory compliance may mandate that your data be stored redundantly in

remote locations. 3. Your internal infrastructure may have remote offices that require access to

the same data.

Cloud Storage Fundamentals 31

4. Data Storage Vendors 4.1. Google Cloud Platform Google Cloud Platform is one of the most all-encompassing online services. With major entries in the data storage fields backed by an extremely robust global infrastructure.

https://cloud.google.com

Google Cloud integrates a full spectrum of products and one that's constantly evolving. Most, if not all, of the product line works smoothly with the other products.

Applications built with Compute Engine can easily pull assets from Cloud Storage. You can, of course, use the storage products independently of any other service in the platform.

For object storage on the Google Cloud Platform you'd use Cloud Storage. With unlimited capacity and worldwide data centres your data objects can be housed in any of Cloud Storage tiers.

In order of decreasing cost those tiers are: Standard Storage, for objects that require the highest degree of durability and access. Durable Reduced Availability, or DRA, perfect for data backups and other objects that do not require the highest degree of availability. And Cloud Storage Nearline, intended for backups, archives, disaster

Cloud Storage Fundamentals 32

recovery, and other data where increased latency is acceptable.

https://cloud.google.com/products

The actual storage in Cloud Storage is based on buckets and objects. You create a bucket that holds one or more objects. Access to the buckets and objects is handled in a variety of ways.

The API is accessible via XML in either Java or Python and JSON through Java, JavaScript, Python, Go and PHP. Relational data is handled by Google Cloud SQL which supports MySQL.

With Cloud SQL, you have the choice of hosting regions, US, Europe, or Asia, with 100 gigabytes of storage and up to 16 gigabytes of RAM per database instance. With Cloud SQL, you get all the power of MySQL with automatic replication of your data across multiple data centres.

Additional peace of mind comes from the point in time backup and recovery services. Importing and exporting of your existing data is supported by commonly used tools like MySQL dump, MySQL wire protocol, and JDBC.

Much of the power of Cloud SQL stems from the fact that an application can spin up

Cloud Storage Fundamentals 33

database instances on an as needed basis. These instances can be accessed in several ways including the Google Cloud Console.

Additionally, you're free to use the MySQL client through the command line or the JSON API.

Non-relational data is addressed by Cloud DataStore which uses schemaless NoSQL. Cloud DataStore features built in redundancy with automatic replication across data centres as well.

Through NoSQL Cloud DataStore supports ACID transactions for reliable processing. And access to Cloud DataStore and NoSQL is available through the Google Cloud Console interface a command line tool called GCD and a full featured JSON API. Google Cloud's latest offspring in the data storage space is Bigtable.

Also, NoSQL based Bigtable is optimised to handle enormous amounts of data ranging from terabytes to petabytes with single digit millisecond latency the engine that drives Bigtable is the same one that Google uses for its top of the line applications including Gmail, Google Maps, and Google Analytics.

Accessible from the open source HBase API, which integrates nicely with Hadoop, Bigtable encrypts data in transit as well as at rest.

4.2. Amazon Web Services (AWS) Amazon Web Services, frequently known as AWS, was the first major player to enter the cloud data storage field, and continues to be a significant force in the market, with products for every corner of the computing realm, including formidable entries in all types of data storage. Amazon's network too is rightfully world famous with a reliable, secure infrastructure, capable of serving entrepreneur to enterprise. Object storage on AWS falls to S3, short for Simple Storage Service. S3 is straightforward and easy to use, while remaining extremely flexible and powerful. Boasting automatic redundancy, S3 is highly scalable and secure.

Choose between three different service levels to find the right fit for your data, Standard Storage, with the highest degree of durability, Reduced Redundancy Storage, which, at a lower cost, is perfect for non-critical data, and Amazon Glacier,

Cloud Storage Fundamentals 34

intended for infrequently accessed data, such as archives and disaster recovery files.

Amazon Web Services has a wide range of products, all of which are integrated with each other.

https://aws.amazon.com/solutions

AWS supports both relational and nonrelational databases. Their primary SQL solution is Amazon RDS, Relational Database Service. Their primary SQL solution is Amazon RDS, Relational Database Service, which supports MySQL, Oracle, SQL Server, and PostgreSQL.

The exact feature set of RDS is dependent on which database engine used, although automatic backups are enabled by default across the board. Fully scalable, RDS spins up database instances as needed. Configure your instances to use from one to

Cloud Storage Fundamentals 35

32 virtual CPUs, with one to 244 gigabytes of memory.

Amazon Cloud Databases

If You Need Consider Using Product Type

A managed relational database in the cloud that you can launch in minutes with a just a few clicks.

Amazon RDS Relational Database

A fully managed MySQL compatible relational database with 5X performance and enterprise level features.

Amazon Aurora Relational Database

A managed NoSQL database that offers extremely fast performance, seamless scalability and reliability

Amazon DynamoDB NoSQL Database

A fast, fully managed, petabyte-scale data warehouse at less than a tenth the cost of traditional solutions.

Amazon Redshift Data Warehouse

To deploy, operate, and scale in-memory cache based on memcached or Redis in the cloud.

Amazon ElastiCache In-Memory Cache

Help migrating your databases to AWS easily and inexpensively with zero downtime.

AWS Database Migration Service

Database Migration

https://aws.amazon.com/products/databases Data is automatically replicated across three regional AWS data centres, and optionally you can take advantage of Amazon's new cross-region replication service to further spread your data around the globe. 4.3. The Microsoft Cloud The full range of services from computing to analytics, to the Internet of things integration, is as robust and compelling as any in the market. The Microsoft brand also brings a distinct familiarity and a collection of compatible services and tools, such as Active Directory and Visual Studio.

Unstructured data is fully supported with Azure Blobs. Boasting over 40 trillion stored objects and an average of 3.5 million requests per second, Azure Blobs provides high durability and accessibility.

Azure Blob Command Tools

Cloud Storage Fundamentals 36

● AZ Copy ● PowerShell ● Azure cross-platform CLI

There are two abstractions support by Azure Blobs, Page Blobs for disks and Block Blobs for discrete files. Accessible via REST Interfaces, API client libraries, and a set of powerful command tools, like AZ Copy, PowerShell and the Azure cross-platform CLI, Azure Blobs gives you a great many options for object management.

Instead of bucket object, Azure Blobs uses Container blob for its terminology. Each Azure Storage account uses a single root container, which can hold Blobs or other containers.

https://azure.microsoft.com/en-us/solutions

Azure offers two solutions for NoSQL non-relational data, Azure Tables and Document DB. Use Azure Tables for key value data structures and Document DB for document data models. Document DB is a database as a service and has a very full-featured SQL compatible environment that is continually evolving.

Additionally, Document DB is schema-less, which allows your data structures to efficiently evolve over time as well. As you might expect from the developers of SQL Server, Azure's relational database service called SQL Database, is top of the line with full support for existing SQL Server tools, APIs and libraries.

SQL Database is Cloud migration friendly and offers three service tiers for a range of workloads, basic, standard and premium. SQL Database can handle databases up to

Cloud Storage Fundamentals 37

500 gigabytes and provides point in time restore, geo-restore, and geo-replication features.

Microsoft Azure currently offers a free one-month trial with a $200 credit, the perfect way to give this highly competitive service a run for your money.

4.4. HP Helion Cloud HP Helion combines a solid set of products for computing and storage applications, including ones to handle both object and database storage. If you're using HP Cloud Compute, you'll want to tie into their block storage module. With persistent images, even beyond the life of the associated compute instance, you can store your data if necessary.

Object storage comes under the aegis of HP Cloud Object Storage, naturally. Like most other similar services, Cloud Object Storage utilises a container and object structure.

HP Cloud Object Storage Access ● Online console ● Command line interface ● Rest API ● Language bindings

○ Java, PHP, .Net, Node.JS, or Fog (Ruby Cloud Services Library)

Access is available via an online console, a command line interface, a complete rest API, or one of the many language bindings, including Java, PHP, .net, node.js, and Ruby fog.

A common use for objects is to act as a Content Delivery Network or CDN. HP Cloud CDN optimises your cloud object storage to deliver static files with minimal latency, powered by Akamai's global network of edge servers.

Charges are calculated monthly on the amount of storage used, the amount of data transferred out of the system, and the number of get, put, post, copy, or list requests made.

As of the time of this recording, HP offers a no-charge trial period with a substantial credit if you'd like to investigate their services further.

Cloud Storage Fundamentals 38

Sources

Google Cloud Platform Products and Services https://cloud.google.com/products

Google Cloud Platform Pricing Calculator https://cloud.google.com/products/calculator

Google Cloud Platform APIs & Reference https://cloud.google.com/storage/docs/apis

Amazon Web Services S3 https://aws.amazon.com/s3

Cloud Storage Nearline https://cloud.google.com/storage-nearline

AWS’ Pricing Calculator https://calculator.s3.amazonaws.com/index.html

Cloud Storage with AWS https://aws.amazon.com/products/storage

Google Cloud SQL https://cloud.google.com/sql

Google Schemaless Cloud DataStore https://cloud.google.com/datastore

Google Cloud BigTable https://cloud.google.com/bigtable

Azure Blob Storage https://azure.microsoft.com/en-us/services/storage/blobs

Azure Table Storage https://azure.microsoft.com/en-us/services/storage/tables

Rackspace Scalable Cloud Object Storage https://www.rackspace.com/cloud/files

Using the AWS SDKs, CLI, and Explorers http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingAWSSDK.html

AWS SDK for Python (Boto3) https://aws.amazon.com/sdk-for-python

Google Cloud Platform SDK https://cloud.google.com/sdk

Google Cloud Storage JSON API Overview https://cloud.google.com/storage/docs/json_api

AWS Key Management Service (KMS) https://aws.amazon.com/kms

Cloud Storage Fundamentals 39

AWS security-logging https://aws.amazon.com/answers/logging

Google Cloud Platform Bucket Locations https://cloud.google.com/storage/docs/bucket-locations

AWS Global Infrastructure https://aws.amazon.com/about-aws/global-infrastructure

Azure Regions https://azure.microsoft.com/en-us/regions

Azure DocumentDB https://azure.microsoft.com/en-us/services/documentdb

Google Cloud Console https://console.cloud.google.com

Google Cloud DataStore https://cloud.google.com/datastore

Apache HBase https://hbase.apache.org

Cloud Databases with AWS https://aws.amazon.com/products/databases

Azure solutions https://azure.microsoft.com/en-us/solutions

Create your free Azure account today https://azure.microsoft.com/en-us/free

Akamai Cloud Networking https://www.akamai.com/us/en/solutions/products/cloud-networking

Akamai Ion for Free https://content.akamai.com/PG5155-Online-Trials-Ion-Standard.html

Cloud Storage Fundamentals 40