simply netapp

29
Simply NetApp Version 4 Rusty Dehorn, NetApp May 2008 | Revision 4

Upload: trey-davis

Post on 05-Mar-2015

317 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Simply Netapp

Simply NetApp Version 4 Rusty Dehorn, NetApp May 2008 | Revision 4

Page 2: Simply Netapp

2

NetApp is typically classified as a “data storage company,” but what exactly does that mean? There are many ways to store data today– USB flash drives, hard disks, and DVDs are some of the most popular. However, NetApp doesn’t build any of these things. And using any of these things alone does not keep data very safe. Hard drives spin very fast and hot, and they eventually break. Storing DVDs requires huge amounts of physical space. And flash drives don’t store large quantities of data compared to hard disk drives.

To keep data safe and easily accessible, more consumers are relying on “the network” to store (or at least backup) their most important data. Yahoo! offers consumer storage to store pictures, emails, and even financial data. Wells Fargo customers rely on Internet banking to store their account information. Apple offers iDisk for consumer storage. There are countless companies storing your data in some form. So, how do they store your data? Surprisingly, they typically store data on disk drives that are not much different than what’s in your home PC. The key lies in how they “manage” their data storage. Data management really encompasses a few factors. Of course, sophisticated hardware and software is important along with knowledgeable staff. But it all starts with business decisions to determine the value of data. Based on this analysis, decisions can be made as to how data should be managed. So data storage for NetApp customers involves much more than putting bytes on a disk drive; it means managing data. In version 4 of Simply NetApp, we will look at some of the considerations companies face as they decide how to properly manage their data. We will also cover how NetApp solutions can help customers achieve their goals for data management. But first, let’s review how the computing landscape arrived to where it is today.

Page 3: Simply Netapp

3

A Brief History of Business Computing In the 1970s, computers started being used extensively by the business community. The type of computer used was called a mainframe. A mainframe was a very large computer (picture many side-by-side refrigerators) that many people could use at the same time. Luckily, people did not have to sit around the mainframe to use it; they could access the mainframe from their offices. This was done through “dumb” terminals that sat on their desks. A dumb terminal is essentially a keyboard and a screen that is wired to the mainframe; it contains no disk drive of any kind and little or no memory. All programs were accessed, run, and stored at the mainframe. This is vastly different from the computers that sit on desks today—your computer has memory and disk drives of various types; it has programs such as Word and Excel that it runs right on the computer at your desk. The PCs on our desks today can do many things by themselves. Why weren’t there computers on every desk in the early computing days of the 1970s? The answer is simple: computers were too big and too expensive. Only Fortune 500 companies could afford computers at all, so they invested in one computer that could be used by many employees—a mainframe. When companies bought mainframes, they bought all of the parts and software from one vendor—usually IBM, Hewlett Packard, or Digital Equipment Corporation (DEC). You could not use one vendor’s software on another company’s mainframe; for example, only IBM software would work on the IBM mainframe. Also, there was a major services component provided by mainframe vendors; it was accepted that customers typically did very little to manage the mainframe. It was mostly handled by vendors. In the 1980s, this “mainframe” model started changing. Apple, IBM, and Compaq introduced their desktop computers. Individuals worked on their own computer and could save their work on a floppy disk that could be used on another computer. This became known as the PC revolution. At nearly the same time, companies were gaining prominence that “specialized” in certain aspects of computing. These companies offered certain products that were better than those offered by IBM, HP, and DEC: Intel could build a better microprocessor, Oracle had better database software, EMC offered disk drives for mainframes at less cost, and Microsoft offered all kinds of software that could run on desktop computers. Consumers now had a choice when buying hardware and software for their computers. Microsoft, Oracle, and EMC were very successful in their respective businesses and were growing quickly, while IBM, HP, and DEC found their market share shrinking and had to reinvent themselves—they could not compete in every specialized market. As the 1980s continued, the computer landscape evolved further. People realized that they wanted to network their PCs to share files and printers. Apple

Page 4: Simply Netapp

4

and Novell were early leaders in networking PCs. But companies soon realized that they needed a computer on the network that was more powerful than the rest. This powerful computer would allow many users access to one database and could store files centrally that could be used by other people. This bigger and more powerful computer became known as the server. This was the beginning of today’s current “client/server” computer environments; the client is the machine on people’s desks, and the server is the computer storing the data and running the applications that everyone uses. A computer’s operating system (OS) is the software that makes the computer work. It is what “activates” the computer when it is turned on. If you think of the memory and processor as the computer’s brain, then the OS is the “education.” During the mid-1980s, two operating systems started emerging as the winners in the business landscape. One was Microsoft® DOS, which evolved into Windows®, and the other was UNIX. Microsoft was run on PCs that were based on the Intel® processor. PCs (such as Compaq) running Microsoft Windows® were relatively cheap and easy to use. They were used for word processing and spreadsheets in offices. UNIX is an operating system that had its origins in the academic world at universities. Vendors (such as IBM, Sun, HP, and DEC) made their own commercialized versions of UNIX that ran on their own hardware. UNIX was relatively complicated to use but very powerful and flexible—it was the chosen OS for engineering environments. It was also popular for large servers being used by many people at big corporations. These servers either ran a database or were used to store large amounts of data. Often, these UNIX servers were replacing more expensive mainframes. Throughout the years, Microsoft has tried to enter the enterprise and engineering environments by making Windows “beefier,” that is, more reliable and able to run on larger, more powerful computers. UNIX vendors have tried to make UNIX easier to use and less expensive for the mainstream. In the late 1990s, UNIX vs. Windows had become an intense rivalry for share of server operating systems. As we entered the 21st century, an operating system called Linux started gaining prominence. Linux is a flavor of UNIX that runs on a wide variety of industry standard hardware, which makes it cost effective. But the most interesting thing about Linux is the fact that it’s “open source.” This essentially means that it’s free if you agree to certain rules when using it. It is created, maintained and enhanced by a developer community. Companies such as Red Hat and Suse (now owned by Novell) provide enhancements and support that enable enterprises to use this open source software. And moving beyond open source operating systems, there are now open source databases and even open source software to create storage devices. This can create a very interesting business decision for IT Managers- is the software I pay for better than the “free” version? Where does that leave the computer industry today? Again, economics is driving some radical changes; however, we are seeing a bit of the cliché “history repeating itself.” The proliferation of networks and the web has again shifted

Page 5: Simply Netapp

5

computing functionality back to centralized severs similar to the days of mainframe, but now the users can be all over the world. Consumers use online sites to track finances and share pictures. In large enterprises, employees scattered all over the globe access databases that are hosted in one spot. However, individual PCs still retain powerful hardware and software capabilities since hardware is relatively cheap and some processes are best handled on local PCs and not servers thousands of miles away. Enterprises typically want to leverage the power of individual PCs but keep management intensive parts of the environment centralized as possible. In the past, enterprises that required large amounts of computing power would use large, expensive mainframe-type servers. Now, these enterprises are moving toward “farms” of small, cheap servers that are networked together. This environment is very flexible—these small servers can easily be added or taken away as required by the application or business. Very typically, these small servers will run Linux or other open source software making them very cost effective. In a different scenario, a customer needs a server to run an application but does not require the power of a full hardware server. In these cases, virtual servers are used instead. Virtual servers allow single hardware servers to be used as multiple virtual servers saving costs in the areas of hardware purchases, power consumption, and administration. So as computing landscapes change, one need remains relatively constant– the need for storage. Whether the servers are large or small, hardware or virtual, run expensive software or open source software, they all require storage which can be managed appropriately. We’ll examine these qualities after we cover a few NetApp basics. Introduction to the NetApp Storage System The first version of this Simply NetApp booklet written in 1998 described the NetApp data storage appliance called the “filer.” As NetApp has quickly emerged a top enterprise storage provider, the product line has evolved as well. In this section of Simply NetApp, we’ll walk through some NetApp core technology. The FAS storage system is the center of the NetApp product lines. FAS stands for “Fabric Attached Storage” which is a name coined by NetApp to portray that this device will work in different storage networks. These machines were previously called “filers” but that term is not very accurate anymore for reasons that we’ll cover later.

Page 6: Simply Netapp

6

Basic anatomy of a NetApp Storage System FAS storage systems are by definition computers– they have processors, memory, and disk drives similar to your PC at home. However, the hardware used for FAS machines is very specialized for handling large amounts of data going in and out (I/O or input/output is the techy term) which makes them ideal for storage. And speaking of storage, these machines typically have lots of disk drives attached to them. NetApp currently ships a single FAS system that can have over 1100 disk drives attached to it– probably a bit more than your PC at home. As impressive as the hardware is, the most important part of a FAS system is the software that runs it– its operating system. Instead of using a Windows or UNIX type operating system like most other computers, FAS uses a specialized operating system created by NetApp called Data ONTAP. Data ONTAP is only for storing and managing data; you cannot run any additional programs on Data ONTAP. Additionally, it can only run on NetApp storage systems. At the heart of Data ONTAP is WAFL which stands for Write Anywhere File Layout (pronounced “waffle”). Very simply, this is the part of Data ONTAP that manages how data goes in and out of the storage system and how the data is stored on disks. In other words, when a request is made to a Data ONTAP-based storage system to retrieve data, WAFL knows which disks have that data and will retrieve it. Conversely, when you want to save data on a Data ONTAP-based storage system, WAFL will determine where to store that data within the storage system and keep track of it. WAFL is the core technology on which NetApp storage is built. As we start talking about features that customers want in storage, WAFL enables NetApp to deliver so many of them. While the core business of NetApp is still hardware and software associated with FAS storage systems, the company has branched out into other aspects of managing data. One example is the V-Series system. This system allows

Controller- contains CPU, memory, networking connections

Disks

Disks

Page 7: Simply Netapp

7

customers to take advantage of valuable Data ONTAP features but utilizes disks from other storage vendors. In other words, a NetApp controller running Data ONTAP will manage (or virtualize) disk storage from other storage vendors. There are other NetApp products that solve customer challenges related to storage such as providing backups, securing data, and migrating data between different storage systems. Also, as NetApp has become a large vendor at many customer sites, various software tools from NetApp and NetApp partners become ever more important for managing data. We’ll cover these products as we explore customer solutions throughout this booklet.

So You Need to Store some Data … Let’s pretend you’ve just been hired as a storage manager at a new but fast growing and successful company. What are some of the decisions you have to make and things you have to do? Let’s walk through that process and determine how NetApp could help you achieve your goals. First of all, what are your goals? Your initial answer might be “to purchase the best storage for your enterprise at the best price.” Okay, wanting the lowest price is a given for most customers. However, what makes up the best storage? It may not be the same for every customer, and it won’t even be the same for every application within a certain customer. Let’s examine two examples.

Page 8: Simply Netapp

8

Suppose your business is a bank and your customer makes an electronic transfer into his or her account. For whatever reason, that transaction is lost and there is no record of it. What happens? You might have to payout huge sums of money to customers who claim their money was lost. You might be fined. But your biggest loss might be in keeping customers because you now have a reputation for losing customer’s money. It could be disastrous for your business. Now, let’s say your business is a social networking company. A user uploads a video to your site and that video is lost. Some users might be a bit annoyed that their video is not actually posted, but one incident would not have a material effect on the business. However, your main storage challenge is scaling large enough for millions of users to store their content at a reasonable cost. Which business will probably be more concerned about protecting data? Clearly the bank has more to lose. But even within the bank, there are applications where data loss would not be as serious. And the social networking company has financial data that can’t be lost or stolen without serious adverse effect. So why would anyone implement systems where there is any possibility of data loss? The answer is easy– cost. Data can be stored very safely and be available most of the time for a reasonable cost. However, as you move closer to storing data so that it will “never” be lost and always be available in any circumstance, the cost rises exponentially. So a balance has to be struck between cost and avoiding data loss. In fact, this balance has to be struck with regard to all aspects of managing data listed below. Application Data Protection Data Availability Data Security Scalability Performance Cost Customers often choose NetApp solutions because they are better in the above categories than any competitors. However, customers also make decisions based on the quality of pre-sales and post sales support, analyst recommendations, and prior experience with certain vendors. For the purposes of this booklet, we will focus on the product-specific areas listed above. Let’s take a closer look at each of these categories and examine how customers evaluate them. Application

Page 9: Simply Netapp

9

At first glance, “application” might seem a bit out of place on the list above–it is not really a trait you evaluate in storage. However, it is arguably the most important. There is always some business need which requires an application which, in turn, requires data storage–the application is always the driver for the storage. It could be storage for your employees’ email, your customer database, pictures for your website, or just storage space for employees to store documents of all types. So why should this matter? Data is data regardless of the application, right? Well it turns out that some kinds of storage work much better for certain applications. In fact, some application vendors actually require certain kinds of storage for their application to work as advertised. Clearly, the application should be considered. When you start talking about finding storage for an application, the conversation of SAN and NAS comes into play quickly. SAN stands for Storage Area Network while NAS stands for Network Attached Storage. At first pass, both of these technologies provide the same thing–a way of attaching storage into some sort of network. This is different from the traditional direct attached storage where each server owned its own storage and there was no storage network. The difference between SAN and NAS lies in how the server will see and access its storage. The first thing to understand is that all disk drives store data in a form called blocks. With a SAN, these blocks are what goes in and out of the storage system. The application server reads and writes various blocks of data just as it would from any direct attached disk. This application server will track these blocks as required. SAN can work over a network built from a technology called FibreChannel or it can use iSCSI which simply requires an Ethernet network (the network commonly available in all office buildings). With NAS, the application server sees the storage as another computer that contains the files it requires. NAS storage knows how the blocks of data on its disk belong to certain files and presents these files to other computers that need them. The other computers read and write from these files as needed and the NAS systems keep track of changes. NAS always uses Ethernet networking technology.

Page 10: Simply Netapp

10

So there is a question that always comes up in any SAN and NAS discussion- which is better? The best answer is–both. Both technologies have areas where they have a clear advantage over the other when the application is considered. If your application vendor says they will only support you if you use SAN, it usually makes the decision fairly easy. However, many applications will work with both SAN and NAS. This requires looking at other factors such as cost and performance which we will cover later in this booklet. It should be noted that whether an enterprise uses SAN or NAS, both are typically much better than direct attached storage, which can’t be shared among many servers. While NetApp started as a NAS storage company (remember the systems used to be called filers), SAN storage has been offered by NetApp for many years now and has become very successful. Many of the features we’ll cover throughout this book can be used with either NetApp SAN or NetApp NAS. However, one advantage NetApp brings trumps all others in this discussion–SAN and NAS from NetApp is the same box. When you buy a storage system from NetApp, it can be used as either, or both. There is no additional system to buy or learn about. This is very important for customers. As needs evolve and new applications are used, they inherently have SAN and NAS available in their storage system with NetApp. Integration between storage and application is another factor to consider when choosing storage. When storage integrates well with certain applications, it can often make backups or storage management easier and less costly. This can be a software product that integrates with a particular application. Or perhaps a technical report that describes how NetApp can help manage data for a certain application. We’ll discuss this more in the data protection section. Data Protection While it is not typical for data to be lost from a storage system, there are many scenarios where data loss can happen. Disk drives do fail. The connection between a storage system and disk drives can fail. It could be something more

Page 11: Simply Netapp

11

serious such as fire or flood in the data center. In any of these scenarios, will your storage system protect the data and how much will this protection cost? Let’s walk through how NetApp can protect customers from data loss. RTO and RPO are acronyms that often are discussed with regard to data protection solutions. RTO, or Recovery Time Objective refers to how fast the data can be recovered and back online. RPO, or Recovery Point Objective refers to the age of the data being recovered. The more often data is backed up, the shorter the RPO will be. To summarize, the goal of data protection solutions is to achieve the shortest RTO and RPO at a desired cost.

Another consideration for data protection is on-box vs. off-box. On-box data protection gives you the ability to make backups and recover data on the same storage device as the data is actually stored. While this typically provides very fast recovery time for issues like an application error or human error, it won’t really help if that one box is physically destroyed. For this reason, data protection strategies typically have an on-box component and off-box component. This allows administrators to recover from the source that makes the most sense in any situation. Let’s start with basic on-box data protection features. Besides enhanced performance, WAFL also has very important data protection features built in. For starters, disk drives are mechanical (small parts moving fast and generating heat), and they do break periodically. RAID is the feature that protects data in the event of a broken disk on a storage system. There are different versions of RAID available in the marketplace. The most common RAID used by enterprise storage vendors is “mirroring.” (Also called

Page 12: Simply Netapp

12

RAID1 or RAID0+1 by techies.) This RAID type simply copies everything onto a second set of disks. If a disk breaks, you have another disk with the same data, so no data is lost. While this type of RAID works fine, it can be very expensive, since you need twice the disk capacity of all the data that you intend to store. For example, if you need 30TB of capacity, you need to buy 60TB of disk storage. NetApp offers a different, more efficient approach for RAID protection. NetApp RAID makes one disk in your group of disks a “parity” disk. A parity disk does not store data like the other disks. It stores “parity” information about the actual data being stored. Think of parity data as index information about data being stored. Parity information can be compared to remaining data on healthy disks to reconstruct missing data from a broken disk. This data typically gets reconstructed onto a spare disk in the system. This RAID avoids data loss in the event of a disk failure without requiring twice as many disks. And best of all, data access remains available as the system reconstructs data from the broken disk. The type of parity RAID that is now standard on NetApp FAS systems is called RAID-DP. (For RAID experts, this is a unique form of RAID6). It requires two parity disks for every 14 data disks, but prevents data loss in the event of any two disks failing at the same time. As systems and disk drives continue to get larger, the chances of two disk errors occurring at the same time increase greatly. So even as larger disks are used, NetApp RAID-DP provides very robust data protection from disk failures without having to buy a lot of extra disk. In addition to RAID-DP, NetApp does offer mirroring. However, this is typically used to protect data against more serious types of failures—not a disk drive failure. We’ll cover more about this later. Perhaps the greatest value with NetApp RAID is its integration with Data ONTAP and WAFL. This means that enabling RAID requires no extra setup time—from the moment you power up a NetApp device, RAID is working. This is a major reason NetApp requires much less configuration than most other storage. Secondly, there is no performance penalty for using NetApp RAID. NetApp systems were designed to work with RAID integrated; it is not a “bolt-on” feature that slows performance when enabled.

Page 13: Simply Netapp

13

Also built into WAFL is perhaps the one most valuable feature of all NetApp product lines—Snapshot™. Snapshot is a function that can take a nearly instantaneous picture of a dataset at a desired point in time. Taking snapshots form the basis for features that have become critical to customer environments all over the world. Once a Snapshot copy is created, you can recover data from it at a later time. But the real beauty of snapshot copies is they are not complete physical copies of all the data. Data that hasn’t changed between snapshot copies will only be stored once no matter how many snapshot copies point to it. In practice, this allows storage administrators to keep many copies of a data but requires very little additional storage. Here is an example of how Snapshot copies would be utilized for recovery. Imagine you’re writing a document called “Simply_NetApp.doc,” and suddenly the screen looks weird, and you receive that ugly error message, “This font is not recognized.” You close the file and open it again, and the same message appears. This file is officially corrupted—a technical term for “messed up.” Luckily, it was saved on NetApp storage. A slightly older version of the file “Simply NetApp.doc” is saved in a Snapshot folder, so you can just go into the Snapshot folder and retrieve a previous copy of the file before it became corrupt. Yes, the most recent edits might be missing, but that’s better than missing the whole thing. Believe it or not, this is a true story that happened while a previous version of this booklet was being written.

Page 14: Simply Netapp

14

While other storage vendors now have snapshot technology, NetApp snapshot still offers two major advantages. First, because NetApp snapshot technology is highly integrated with WAFL, the system does not suffer a performance loss when they are used. The second advantage is highly related to the first- since there is no performance loss when using NetApp snapshot, you can keep more of them onlne. This allows you to keep a long history of what your data looked like at various points in time. RAID and Snapshot technology is great for on-box data protection. As mentioned previously, some data loss scenarios will require off-box recovery. The minimum off-box backup and recovery solution is tape backup and recovery. Backups to tape are periodically made and sent to a safe location remote from the main business. However, recovering from such tapes is not a good solution for critical systems. First of all, RTO can be very long with tape, especially when large amounts of data needs to be recovered. It can take a few hours to find the tape, then a few more hours to configure disk storage (if you have any available). Then, you finally start to stream the data from the tape onto the disk storage device, which can take up to a few days, depending on the amount of data. Secondly, there are typically changes to the data since the last tape backup was made. For example, if you back up data every night at midnight and the disaster occurs at noon, you will be missing 12 hours worth of data on your tape. In fact, nightly backups mean your RPO is up to 24 hours! As mentioned before, many valuable NetApp offerings are based on the Snapshot technology. One of the most popular for off-box backup and recovery is SnapVault. With SnapVault, the system takes a snapshot on one storage system (system 1) and replicates the data to another storage system (system 2). At a specified time, another snapshot occurs on system 1. The system compares the two snapshot copies to figure out which data has changed. Then only the changed data blocks are sent to system 2. Each time new changes arrive at system 2, you also take a snapshot on that system. This allows you to have a history of snapshot copies on two systems in two locations in case a disaster strikes at one site. Typically, customers can use less expensive storage systems on the destination system (system 2 in our example) since that data is only accessed in an emergency. Many times, customers will keep a longer history of backups on this less expensive destination system. SnapVault can also be implemented when the primary (source) system is not using NetApp storage. This is referred to as Open Systems SnapVault (OSSV). Either way, SnapVault avoids having to make full tape backups and archive them offsite. The data is always accessible online.

Page 15: Simply Netapp

15

While most customers agree that backup and recovery from disk is ideal, many don’t know where to start. Their backup and recovery process is built around backup software designed to work with tape and moving to a technology like SnapVault can’t happen immediately. For these environments, NetApp offers a VTL solution or virtual tape library. A VTL provides disk based backup that easily integrates with enterprise backup software such as Symantec NetBackup or Tivoli Storage Manager. Instead of writing data to a tape library, the backup software writes the data to the VTL. VTLs typically provide better performance than tape libraries, especially when reading data during a recovery scenario. The most recent backups can be kept on the VTL for fast recovery. As backups get older, they can still be migrated to tapes for longer term archive. Data Availability Data Availability often gets confused with data protection. While data might be protected in a disaster, it might take a while to make it available again. Availability can also be related to performance. For example, if a disk drive has failed, is the storage system still performing at a level that makes applications usable?

Page 16: Simply Netapp

16

There are some data availability features on a storage system which are considered basic. Hardware redundancy is one of them–having at least two of everything. Most NetApp storage systems are sold as a pair of storage systems to ensure data is available if an entire system should fail. This configuration is typically called a “high availability” configuration or HA for short. Historically, we called this type of a system a “cluster” at NetApp but that name is not ideal anymore since the term cluster can mean other things now.

Human error can be a fairly common cause for data or an application being unavailable. Part of the NetApp solution to help avoid human error is pretty straightforward: make it simpler! Keeping NetApp products simple is a task taken very seriously by our engineers. Enterprise storage can, by nature, be very complex, and making it simple takes work. Luckily, even though the capabilities of the NetApp storage appliances have grown over the years, the devices are still appliances and tend to be much simpler than other competing storage offerings. While data availability of storage is a big concern for storage managers, most people at a company just care about application availability. In other words, can they get to their email or customer database? Even if the storage is working perfectly fine, these applications might have an error along the way that corrupted data and requires a recovery process. NetApp has a suite of solutions that provide various applications with better availability. The product line that provides this integration with various applications is named the SnapManager family. An example of a SnapManager product is SnapManager for Microsoft Exchange. Microsoft Exchange is the program that provides email, calendar, and address books for most large companies (including NetApp). Taking frequent backups and enabling fast restores keeps this mission critical application available to users. Besides SnapManager for Exchange, there are SnapManager solutions available for Oracle database, SAP, Microsoft SQL database, and Microsoft SharePoint collaboration software. This list is constantly growing since customers have many different applications that require high availability.

Page 17: Simply Netapp

17

Surprisingly, human error can be a fairly common cause for data or an application being unavailable. Modern IT environments can be extremely complex with many people working on them all over the world. One IT person might change a setting to fix a problem that has an unexpected consequence elsewhere. While NetApp won’t make every IT problem go away, NetApp can help keep the storage manageable. This is one reason many large companies heavily rely on NetApp storage solutions now. In some businesses with certain kinds of data, backup and recovery are not options, no matter how fast we make them. These customers require a disaster recovery solution that allow the business to run from a second location in the event of a disaster or serious failure at the primary site. The following section will describe NetApp business continuance solutions.

Page 18: Simply Netapp

18

Disaster recovery is an aspect of storage that has received more attention in recent years. As businesses require systems to be available at all times, large natural disasters and terrorist attacks have reminded us how even the best systems can occasionally be defenseless. Here again, different businesses require different types of disaster recovery. For example, a bank’s automatic teller network might warrant a disaster recovery plan capable of overcoming a whole-site disaster within a few minutes, while simple recovery from tape backup may suffice for the bank’s human resources department. A company would easily survive were its sales force denied access to PowerPoint presentations for a few days. However, losing that company’s accounting systems required to ship products and report revenue for a few days might ruin the company. Business decisions must be made to determine which applications require disaster recovery and what type. At the highest end of the disaster recovery spectrum are two identically configured data centers, located far enough apart so that a disaster that affects one will likely not affect the other. Every time there is any change to the data, it is made at both locations almost immediately. In the event a disaster occurs at one data center, all operations are moved to the remaining data center, so the business can continue to function. This type of system demands “synchronous mirroring” because it provides up-to-the-minute mirrored sets of data at all times. As you might expect, such a system is very costly, since it requires very sophisticated equipment, a very fast interconnecting network, and two of just about everything. However, some businesses must have and can cost-justify this level of protection. NetApp can provide this level of protection for storage with a solution called MetroCluster.

Page 19: Simply Netapp

19

The NetApp SnapMirror® replication solution brings simple and cost-effective disaster recovery that can even be synchronous or “close to synchronous,” but at costs that make disaster recovery possible for a wider range of data than ever before. It doesn’t require a special network—often the preexisting network between the two sites will work fine. It can even work over a very slow network if fast networks are not available or cost prohibitive. The second site doesn’t require an IT staff, and it can often use less expensive storage. The simplicity of this solution cannot be emphasized enough. To illustrate this, NetApp can easily teach customers to setup and manage SnapMirror replication. This is compared to many other disaster recovery solutions in which the vendor spends weeks setting up the system and continues to manage it for the customer- a costly proposition. SnapMirror has brought more robust disaster recovery to customers who needed something better than tape recovery, but could not justify a synchronous mirroring solution. If NetApp customers do need synchronous mirroring with NetApp storage in both locations, they might choose MetroCluster, which combines mirroring and high availability– it can automatically failover to the storage in the second data center. Both SnapMirror and MetroCluster are disaster recovery solutions that require NetApp storage systems running Data ONTAP. However, what if one of both of the storage systems are not from NetApp? This scenario was the reason NetApp acquired a company called Topio and now offers ReplicatorX software. This software can be used to replicate storage between two systems even if it’s not NetApp storage. This gives customers a good disaster recovery solution when there is non-NetApp storage in place. And finally, monitoring the storage environment is a very important piece of data availability. Problems left unfixed can lead to data being unavailable. And if the storage is unavailable, bringing it back online quickly is critical. Operations Manager and Protection Manager are software tools from NetApp that allows customers to monitor their NetApp storage systems. It can alert administrators when certain events happen such as nearing full capacity, slow performance, or backups that are failing. NetApp storage systems can also automatically send emails to an administrator or the NetApp support center upon certain kinds of failures. Even if the storage system is completely down, these email can still be sent from a special card within the system. And NetApp support is never closed, no matter where in the world a customer is located. Data Security The intuitive question people ask when they hear “data security” is: do the correct people have access to the data? This involves many parts of the IT environment–right down to the person at the helpdesk who grants or denies user

Page 20: Simply Netapp

20

access to certain data. Storage systems can play a big part since seeing sensitive data is often the ultimate goal of somebody breaking into a network. While a lot of attention is paid to protect the data center from outside intrusion, there are other aspects to consider. A large number of data breaches come from internal employees–maliciously or sometimes unwittingly. There is also data in transit to protect. This could be a backup tape being moved to a secondary site, or it could be data traveling over a network to a second site. The first aspect to security with NetApp storage is making sure that we integrate well with existing security infrastructures. Over the years, NetApp has made sure that its products have supported new security features being implemented by Windows or UNIX operating systems. NetApp has also published best practices for security. NetApp also offers role-based access control on its products. This is a fancy term for giving certain administrators the correct level of access on the systems. Another security solution allows customers to ensure data cannot be altered or deleted for a certain period of time. This data is typically referred to as WORM data (write once, read many) and solutions that provide this technology are broadly called “compliance” solutions. There are laws and policies enacted that require compliance solutions for some data. Some of these laws came about from high profile corporate scandals in the 1998 to 2003 timeframe. The NetApp compliance solutions are based on a technology called SnapLock. With SnapLock technology, you can create space on storage systems where data will be held in a WORM state for a specified time. The strongest version of SnapLock ensures that nobody (even administrators) can ever modify or delete the data stored in a WORM state on NetApp storage. When you combine the SnapLock features with SnapVault backups, customers can get disk-based backups that are stored in WORM state. This is ingeniously named LockVault. While other vendors offer compliance solutions, the biggest advantage NetApp brings is that mainline storage systems can easily become compliance solutions with the simple addition of a software license. Other vendors require dedicated machines that can’t be used for other needs and require additional training to manage them. With NetApp, a single storage system can be used for many applications including compliance. Perhaps the most notable security offering from NetApp is the products it offers from the acquisition of Decru. DataFort devices offer the ability to encrypt stored data. In other words, unless you have the right authorization from the DataFort device to see data, it will always look something like this- ^7hjnnn%&(JJ$%FD$EL(&$ED GF(^8r8r7)(%$JJGF*^R!~><OJ888^^^))) OYO%#Cc%)*KH&*%G$#&L@|}{

Page 21: Simply Netapp

21

Example of encrypted data In order to view this data in its real format, the DataFort will have to re-issue a “key” (a secret electronic message) to the authorized computer or user requesting the data. This key was originally created when the data was first stored or encrypted. For administrators to change access permissions on a DataFort, a quorum must be met. For example, a typical quorum for DataFort devices is 2 out of 5. That means that two separate people have to insert a special card and enter a password to change access permissions for data. Also, any attempts to access data (successful or not) will be logged in a secure format. This may sound like science fiction but it works very well to protect data from unauthorized access.

So how is this technology valuable to customers? All of us have heard news stories about tapes with very personal data being lost or stolen. If DataFort devices are used when that data is written to tape, only DataFort devices that generate the proper keys can read that data. In others words, anybody who finds an encrypted tape and tries to read the data without keys from the DataFort will only see useless data. This saves companies from embarrassment and possible financial loss from fraud or lawsuits. While tape backups are the most commonly used example for encrypting data, Decru products can actually encrypt any stored data- including data on disk drives. Remember how we noted that many attacks on data come from inside an organization? This is where encrypting data on disk drives becomes important. You can give a storage administrator the needed access to manage storage, but the actual data on that storage will be encrypted.

Page 22: Simply Netapp

22

DataFort is not the only product available to encrypt data at rest. However, the DataFort enjoys a reputation for being fast and very secure (DataForts are widely used in governments around the world). However, one challenge faced by all large enterprises who want to encrypt data is managing the keys. While keeping data secure is paramount, nobody will use encryption if the people who truly need this data can’t access it. NetApp has a centralized key management system that makes sure keys are always backed up and available across an organization. This product is called the Lifetime Key Management system, or LKM. No matter how data is encrypted in the future, key management systems will remain a very important factor for customers. Performance While performance is obviously important, people buying storage have to make sure they consider it in the proper context. If you hook a super fast storage system up to a server or servers that can’t take advantage of the performance, you’ve probably wasted some money. And you might have missed other features that would be more beneficial. There are many aspects of performance to consider. Do you want to store (write) small chunks of data quickly or retrieve (read) large datasets? In other words, it is important to consider performance in the context that will matter to your business. That said, there are some businesses where storage performance can directly affect the company’s bottom line. Getting a digitally animated movie out on time or introducing a new electronic design to the marketplace quickly are critical in those markets. While disks are a big part of a storage system, they are typically the slowest part of the system for moving data. Disks are mechanical while memory is solid state (electronic in non-techy terms). As a result, disk performance is usually measured in milliseconds while memory performance is measured in nanoseconds. For the non-scientists reading this, nanoseconds are much faster. So, high performance storage systems typically use memory as much as possible to increase the performance of the system. WAFL is very good at utilizing memory to increase performance of NetApp storage; here is how it works. First of all, when a computer sends data to a NetApp appliance for storage (referred to as a “write”), WAFL can take data directly from the network and store it in memory. As mentioned previously, data can be stored much faster in memory than on disks. As soon as this data enters memory, the storage system acknowledges to the sender that the data is stored safely. In other words, this data is stored very quickly using memory first before eventually storing the data on disks. This write data is protected by memory that has a battery attached to it (technically called nonvolatile random-access memory, or NVRAM in a NetApp device). Consequently, if there is a power outage before the data has been stored on the disk drives, the data will remain safe because the battery continues to power the NVRAM memory.

Page 23: Simply Netapp

23

Similarly, WAFL can use memory effectively when retrieving data from the NetApp device (referred to as a “read”). WAFL looks at what data is being accessed most often and keeps a copy of this data in memory as well. This leads to very fast performance for reads. When appropriate, WAFL can anticipate what data might be accessed next and pulls it into memory. (This is called “read-ahead” in techy terms.) While other storage vendors use many of the same techniques described above, the WAFL implementation is very unique when you explore details. As a result, NetApp typically excels on industry standard benchmarks which measure performance of storage systems. While WAFL and Data ONTAP can help achieve performance goals, having robust hardware is also important. NetApp offers a large range of hardware platforms to deliver certain levels of performance. Each of these platforms have certain memory and CPU configurations to achieve the best performance for the money. Additionally, NetApp now offers 3 types of disks drives (Fibre Channel, SAS, and ATA) in various capacities which vary in performance and cost. To ensure customers receive the right configuration for their needs, NetApp has developed sizing tools that can be used by pre-sales engineers. So here is the question that everyone asks–does NetApp make the fastest storage available in the market? In many cases, yes. But, no storage vendor can legitimately claim fastest in all cases since application workloads and measuring methods vary widely. However, NetApp offers a combination of hardware and software that is very competitive and widely used by customers who demand the highest performance from their storage. And given what we’ll talk about next in the scalability section, the conversation is moving from a fast storage system to storage grids.

Page 24: Simply Netapp

24

Scalability It’s pretty easy to see that data storage needs are growing in every business all around us. Entire movies are computer animated. Consumers are posting video and pictures to websites in enormous quantities. And businesses want to warehouse data, so they can analyze every aspect of it. All enterprise storage vendors have systems that scale very large. However, scaling one storage system large enough to meet all needs is impractical. So storage is moving toward a “grid” where many systems are attached together to provide the needed scalability. The transformation to grid storage is very similar to the change in the computing market described earlier in the booklet. Customers used to purchase a large server to meet their needs in the future. Now they buy small servers and add more as required. It’s much more cost effective and flexible. But, what exactly is scalability for storage? For the purposes of this booklet, we’ll keep it simple−growing capacity, performance, and manageability to meet the business need. Even these categories have sub-divisions–does scaling capacity mean adding storage or perhaps adding network connectivity (how many “pipes” are there into this storage system)? When you talk about scaling storage, the traditional way it has always been done is with larger systems. You would buy one system that met your needs in terms of capacity, performance, and connectivity. As mentioned previously, NetApp has always offered a wide range of storage arrays that meet most needs in these three areas. At the time of this writing, the largest single storage system from NetApp scales to 1.1 petabytes with over 1100 disk drives. It can also read and write data at many hundreds of megabytes/sec and has many connectivity

Page 25: Simply Netapp

25

options- up to 48 ports. One feature NetApp has offered for over 10 years has been the “disk in place” upgrade. This means that if you need a larger system, only the storage controllers need be changed- the existing disks can continue to be used. Now with the new storage controller, you have the ability to attach more storage to the system and achieve better performance. You may hear about this as “scale up” or “vertical scaling” approach to storage. For many NetApp customers, this has been and will continue to be the way they grow their storage. However, for a small but growing subset of customers, building a storage grid is a much better option to achieve the needed scalability. As you hit a performance or space limit for a certain controller, you simply attach more controllers or disks as needed. Even as this system grows, it continues to get accessed at a single address- like it is a single system. This approach to scalability is typically called “scale out” or horizontal scaling”. NetApp currently sells a scalable system that can grow to 20 or more controllers (or nodes as they are typically called in this market).

So how does NetApp implement this technology? The hardware used for the NetApp storage grid is very much like its non-grid FAS systems. It has memory, processors, network connectivity and disk drives. However, these machines use a different version of Data ONTAP called Data ONTAP GX. Some of the internal technology in Data ONTAP GX came from the NetApp acquisition of Spinnaker Networks in 2004. The trick has been converging all of the great features in the original Data ONTAP with the features from Spinnaker that allow a storage grid

Page 26: Simply Netapp

26

to be built. Over time, the two versions of Data ONTAP will further converge into one operating system and customers will be able to get all features in this operating system. In the coming years, the market and product offerings for the grid storage market will be a heavy focus at NetApp. If you think about some of the most dynamic and valuable NetApp customers, they all will be early adopters of storage grids due to unprecedented scalability needs–Yahoo!, Facebook, Google to name a few well known names. Other customers such as pharmaceutical and energy exploration companies have similar scalability needs. NetApp will strive to offer the right technology for them at the right cost for their businesses. Cost Cost is typically very misunderstood in the storage industry. People generally define cost as the purchase price of storage. However, multiple third-party studies have shown that the purchase price of storage is actually a relatively small part of overall storage cost in an enterprise environment. The cost of people’s time to manage storage is the largest portion of overall cost by a large margin. Therefore, buying a big pile of the cheapest disk drives might quickly become expensive as you try to manage them. So how does storage become less expensive to manage? Making the basic tasks of the storage simpler and providing good tools to administer storage are good starting points. Making the storage reliable and available is also important. And finally, providing value-add services and training are also very important. But even as the cost of managing is considered, storage buyers will rarely be able to buy it if the purchase price is too high. So as a storage manager, you typically look for storage that has a reasonable purchase price for your purchasing department but is also manageable enough for the storage team. NetApp (and all major storage vendors for that matter) try to offer unique products that help keep overall storage costs low for customers. Here are some unique NetApp advantages in this area. Keep things simple. The original tag line for NetApp when the company was very small was “fast, simple, reliable”. Simplicity is ingrained into the NetApp culture. Even though NetApp storage products become larger with a growing list of functionality, keeping them simple remains a goal. To this day, field engineers who come to NetApp from other companies still marvel at the simplicity of tasks on NetApp storage. Same Data ONTAP no matter how big or how small. With NetApp, you can buy a storage device with a 16TB maximum capacity or 1.1 PB maximum capacity. What’s unique about NetApp is that they will run the same Data ONTAP software. An administrator who knows how to run our smaller machines also can run the largest machines. There is huge cost savings since functionality only has

Page 27: Simply Netapp

27

to be learned once when working with NetApp. While not every product line offered by NetApp uses Data ONTAP, all of the core storage products do, which is unlike other large storage vendors. Running the same version of Data ONTAP across the whole product line brings another advantage in the area of replicating data. Other storage vendors typically have limits regarding which platforms can replicate data to other platforms in the product line. With NetApp, all storage platforms work the same and can replicate to each other. So you can take the least expensive NetApp machine and replicate to the most expensive NetApp machine or vice-versa. This can save money for a customer since they can buy a less expensive storage machine for the destination site. Often, this data is backup data and will not be accessed except in extraordinary circumstances, so it makes sense to buy a cheaper system. Many customers have taken advantage of this for cost savings. Using more storage than you bought. Huh? How does that work? It turns out that NetApp has offered this for many years in the form of Snapshot copies. They offer full copies of data but only actually store changes from the original dataset. Could you imagine if all those were actual full copies? The cost of all that storage plus facilities and cooling for it would be huge. While most other storage vendors now have some version of snapshots, NetApp Snapshot technology remains the most usable. Some recent research found that a large enterprise customer of NetApp had over 10,000 snapshots copies online. With other vendors snapshots, using them usually means lower performance which can be a bad tradeoff. With NetApp snapshots, no tradeoff is required. In 2005, NetApp introduced FlexClone. With regular snapshots, you can only read data from the snapshot copy which is fine for recovery purposes. With FlexClone, you can read and write data to a Snapshot-backed copy. In practice, this means that you can make a fully functional copy of a dataset very quickly and without needing storage space for a full copy. FlexClone technology keeps track of what data belongs to the original data set and what data belongs to the new clone. Blocks of data that are common between them are only stored once and referenced by both copies. NetApp customers have taken this technology to incredible levels. One customer is getting over 200% utilization of their storage purchase using this technology. This technology is very valuable for customers developing or testing a new application or patch. They can quickly create FlexClones without requiring separate storage space for all copies. This technology helps test applications more thoroughly while saving cost. And finally, we have NetApp’s latest way to do more with less physical storage. It’s called deduplication. Very simply, Data ONTAP will look at all your data blocks and find data that matches. If it finds two or more blocks of data that match, it only keeps one copy freeing up the space that held the other copy. While deduplication is not a new concept, it was typically only offered for backup systems because it usually slowed storage performance. NetApp deduplication

Page 28: Simply Netapp

28

can be used for many types of datasets. There is typically no performance change for reads when accessing deduplicated data. And there is only a small slowdown for writes. This makes NetApp deduplication more widely usable for customers. Deduplication is available for NetApp FAS and V-Series storage as well as NetApp Virtual Tape Libraries (VTL). While the technology is a bit different to meet the needs of each platform, the benefit to the customer is largely the same- buying less storage. With the focus on the environment lately and cutting back on harmful emissions, having less storage equipment is very desirable, not just for cost savings, but for social responsibility reasons as well. Fortunately, this is an area where NetApp is already a leader and will continue to develop ideas. In Summary Hopefully, Simply NetApp has given some insight about how enterprise customers evaluate storage and the solutions NetApp offers to them. What’s most important to understand is there are many aspects to evaluating storage and customers are probably making a mistake if they just look at one. While the early days of NetApp were focused on selling to engineering departments at technology companies, NetApp has grown into a major storage supplier with a broad range of solutions. With newer technology such as grid storage and deduplication of data, NetApp will continue to provide compelling solutions to help customers manage data. Storage Facts How Many Bytes? Bytes are the basic measurement used for measuring the size of computer data. A simple one-page document created by Microsoft Word is usually around 25,000 bytes. If we follow the pattern: 1 byte x 1,024 = 1 kilobyte 1 kilobyte x 1,024 = 1 megabyte 1 megabyte x 1,024 = 1 gigabyte 1 gigabyte x 1,024 = 1 terabyte 1 terabyte x 1,024 = 1 petabyte *There are actually two slightly different ways multiplying bytes. However, the end-result values are similar enough that the above chart is good summary. Another unit of measurement for data is a “bit.” This unit is generally used when speaking about a quantity of data flowing over a network. Luckily, bits and bytes are related—there are eight bits in one byte.

Page 29: Simply Netapp

29

Want to Learn More Storage Speak? See the NetApp glossary at http://now.netapp.com/NOW/public/glossary/ Storage Networking Industry Association (SNIA) Dictionary http://www.snia.org/education/dictionary/