h10502 vfcache intro wp

7/27/2019 h10502 Vfcache Intro Wp

1/31

White Paper

AbstractThis white paper is an introduction to EMC VFCache. Itdescribes the implementation details of the product andprovides performance, usage considerations, and majorcustomer benefits when using VFCache.

February 2012

INTRODUCTION TO EMC VFCACHE

VFCache is a server Flash-caching solution VFCache accelerates reads and ensures data protection VFCache extends EMC FAST Suite to server


2/31

2Introduction to EMC VFCache

Copyright 2012 EMC Corporation. All Rights Reserved.

EMC believes the information in this publication is accurate asof its publication date. The information is subject to changewithout notice.

The information in this publication is provided as is. EMCCorporation makes no representations or warranties of any kindwith respect to the information in this publication, andspecifically disclaims implied warranties of merchantability orfitness for a particular purpose.

Use, copying, and distribution of any EMC software described inthis publication requires an applicable software license.

For the most up-to-date listing of EMC product names, see EMCCorporation Trademarks on EMC.com.

VMware, ESX, VMware vCenter, and VMware vSphere areregistered trademarks or trademarks of VMware, Inc. in theUnited States and/or other jurisdictions. All other trademarksused herein are the property of their respective owners.

Part Number: H10502.1


3/31


Table of ContentsExecutive summary.................................................................................................. 4

EMC VFCache solution ........................................................................................................ 4Introduction ............................................................................................................ 5

Audience ............................................................................................................................ 5Terminology ....................................................................................................................... 5

Use cases of Flash technology ................................................................................. 6VFCache advantages over DAS ....................................................................................... 6Cache in the storage array .............................................................................................. 7

Flash cell architecture ........................................................................................................ 7VFCache design concepts ...................................................................................... 10

Business benefits ............................................................................................................. 13Implementation details ......................................................................................... 15

Read Hit example ......................................................................................................... 15Read Miss example ...................................................................................................... 16Write example .............................................................................................................. 17

VMware implementation .................................................................................................. 18Split-card feature.............................................................................................................. 20VFCache management ...................................................................................................... 21

Performance Considerations .................................................................................. 22Locality of reference ......................................................................................................... 22Warm-up time ................................................................................................................... 22Workload characteristics .................................................................................................. 23Throughput versus latency ............................................................................................... 24Other bottlenecks in the environment .............................................................................. 24Write performance dependent on back-end array ............................................................. 24

Usage guidelines and characteristics ..................................................................... 25Specifications .................................................................................................................. 25Constraints ....................................................................................................................... 26Stale data ......................................................................................................................... 26

Application use case and performance ................................................................... 28Test results ....................................................................................................................... 29

Conclusion ............................................................................................................ 30References ............................................................................................................ 31


4/31


Executive summarySince the first deployment of Flash technology in disk modules (commonly known assolid-state drives or SSDs) by EMC in enterprise arrays, it has been EMCs goal toexpand the use of this technology throughout the storage environment.

The combination of the requirement for high performance and the rapidly falling cost-per-gigabyte of Flash technology has led to the concept of a caching tier. A cachingtier is a large-capacity secondary cache using Flash technology that is positionedbetween the server application and the storage media.

EMC VFCache solutionEMC VFCache is a server Flash-caching solution that reduces latency andaccelerates throughput to dramatically improve application performance by usingintelligent caching software and PCIe Flash technology.

VFCache accelerates reads and protects data by using a write-through cache to thenetworked storage to deliver persistent high availability, integrity, and disasterrecovery.

VFCache coupled with array-based EMC FAST software provides the most efficientand intelligent I/O path from the application to the data store. The result is anetworked infrastructure that is dynamically optimized for performance, intelligence,and protection for both physical and virtual environments.

Major VFCache benefits include:

Provides performance acceleration for read-intensive workloads As a write-through cache, enables accelerated performance with the protection of

the back-end, networked storage array

Provides an intelligent path for the I/O and ensures that the right data is in theright place at the right time

In split-card mode, enables you to use part of the server Flash for cache and theother part as direct-attached storage (DAS) for temporary data

By offloading Flash and wear-level management onto the PCIe card, uses minimalCPU and memory resources from the server

Works in both physical and virtual environmentsAs VFCache is installed in a greater number of servers in the environment, moreprocessing is offloaded from the storage array to the server. This provides a highlyscalable performance model in the storage environment.


5/31


IntroductionThis white paper provides an introduction to VFCache. Flash technology provides anopportunity to improve application performance by using it in different ways in acustomer environment. Topics covered in this white paper include implementation inphysical and virtual environments, performance considerations, best practices, usageguidelines and characteristics, and some application-specific uses cases.

AudienceThis white paper is intended for EMC customers, partners, and employees who areconsidering the use of VFCache in their storage environment. It assumes a basicunderstanding of Flash technology and its benefits.

TerminologyCache page The smallest unit of allocation inside the cache, typically a fewkilobytes in size. The VFCache cache page size is 8 KB.

Cache warm-up The process of promoting new pages into the VFCache after theyhave been referenced, or a change in the application access profile that begins toreference an entirely new set of data.

Cache promotion The process of copying data from the SAN storage in the back endto VFCache.

Hot spot A busy area in a source volume.Spatial locality of reference The concept that different logical blocks located closeto each other will be accessed within a certain time interval.

Temporal locality of reference The concept that different logical blocks will beaccessed within a certain time interval.

Working set A collection of information that is accessed frequently by theapplication over a period of time.


6/31


Use cases of Flash technologyThere are different ways in which Flash technology can be used in a customerenvironment depending on the use case, application, and customer requirements.EMCs architectural approach is to use the right technology in the right place at theright time. This includes using Flash:

In the storage array As a cache As a tier As a singular tier for the entire applicationIn addition, there are different types of Flash, with different cost structures, differentreliability considerations, and different performance characteristics. All of thesedifferent types of Flash have a proper place in the vast use cases continuum. Thesetechnologies range from PCIe to SSD, as well as single-level cell (SLC) to multi-levelcell (MLC) Flash architecture.

Some of the use cases in which Flash can be used (and some of them may overlap)ina customer environment are:

Applications with high performance and protection requirements that may be readheavy are a perfect fit for using PCIe Flash in the server as a cache, for example,VFCache.

Applications with performance requirements without protection requirements,such as temporary data, may be a good fit for PCIe Flash in the server as a datastore (direct-attached storage (DAS)), for example, the split-card feature inVFCache.

Applications with performance and protection requirements that are read andwrite heavy may be a good fit for Flash in the array as a cache, for example, EMCFAST Suite on an EMC VNX storage system.

Applications with mixed workloads and changing data temperature is a perfectfit for Flash as part of a tiering strategy, for example, Fully Automated StorageTiering for Virtual Pools (FAST VP) on an EMC storage system.

Applications requiring high consistent performance may be a good fit for Flash asthe single tier of storage, for example, using SSDs in an EMC storage system.

VFCache advantages over DASOne option to use PCIe Flash technology in the server is to use it as a DAS devicewhere the application data is stored on the Flash. Advantages of using VFCache overDAS solutions include:

DAS solutions do not provide performance with protection, since they are notstoring the data on an array in the back end. If the server or the Flash card isfaulted, you run the risk of data unavailability or even data loss. VFCache,


7/31


however, provides read acceleration to the application, and at the same timemirrors application writes to the back-end storage array, thereby providingprotection.

DAS solutions are limited by the size of the installed Flash capacity and do notadapt to working sets of larger datasets. In contrast, once the working set ofthe application has been promoted into VFCache, application performance is

accelerated. Then, when the working set of the application changes, VFCacheadapts to it and promotes the new working set into Flash over a period of time.

DAS solutions lead to stranded sets of data in your environment that have tobe managed manually. This is in contrast to application deployment on astorage array where data is consolidated and can be centrally managed.

Cache in the storage arrayAnother way in which some solutions use PCIe Flash technology is to use it as a cachein the storage array. However, VFCache uses PCIe Flash in the server and is muchcloser to the application in the I/O stack. VFCache does not have the latency

associated with the I/O travelling over the network to access the data.

Flash cell architectureIn general there are two major technologies used in all Flash drives:

Single-level cell (SLC) NAND-based Flash cell Multi-level cell (MLC) NAND-based Flash cellA cell is the smallest unit of storage in any Flash technology and is used to hold acertain amount of electronic charge. The amount of this charge is used to store binaryinformation.

NAND Flash cells have a very compact architecture; their cell size is almost half thesize of a comparable NOR Flash cell. This characteristic, when combined with asimpler production process, enables a NAND Flash cell to offer higher densities withmore memory on a given semiconductor die size. This results in a lower cost pergigabyte. With smaller, more precise manufacturing processes being used, their priceis expected to fall even further.

A NAND Flash cell has faster erase and write times compared to NOR Flash cells,which provides improved performance. It has higher endurance limits for each cell, afeature that provides the reliability required for enterprise-class applications.

Flash storage devices store information in a collection of Flash cells made from

floating gate transistors. SLC devices store only one bit of information in each Flashcell (binary), whereas MLC devices store more than one bit per Flash cell by choosingbetween multiple levels of electrical charge to apply to its floating gates in thetransistors (Figure 1).


8/31


Figure 1: Comparison between SLC and MLC Flash cell data storage1Since each cell in MLC Flash has more information bits, an MLC Flash-based storagedevice offers increased storage density compared to an SLC Flash-based version;however, MLC NAND has lower performance and endurance because of its inherentarchitectural tradeoffs. Higher functionality further complicates the use of MLC NAND,which makes it necessary for you to implement more advanced Flash managementalgorithms and controllers.

SLC NAND and MLC NAND offer capabilities that serve two very different types ofapplications those requiring high performance at an attractive cost per bit (MLC),and those who are less cost sensitive and seeking even higher performance over time

(SLC).

Taking into account the kind of I/O profiles in enterprise applications and theirrequirements, EMC VFCache uses the SLC NAND Flash architecture.

Table 1 compares the SLC and MLC Flash characteristics (typical values).

1Kaplan, Francois, Flash Memory Moves from Niche to Mainstream, Chip Design Magazine,April/May 2006


9/31


Table 1: SLC and MLC Flash comparisonFeatures MLC SLCBits per cell 2 1

Endurance (Erase/Write cycles) about 10 K about 100 K

Read service time (Avg) 129 us 38 us

Write service time (Avg) 1,375 us 377 us

Block Erase (Avg) 4,500 us 1,400 us

Although SLC NAND Flash offers a lower density, it also provides an enhanced level ofperformance in the form of faster reads and writes. Because SLC NAND Flash storesonly one bit per cell, the need for error correction is reduced. SLC also allows for

higher write/erase cycle endurance, making it a better fit for use in applicationsrequiring increased endurance and viability in multi-year product life cycles.

For more details on various Flash cell architectures, refer to the Considerations forChoosing SLC versus MLC Flash technical notes on Powerlink.


10/31


VFCache design conceptsOver the past decade, server processing technology has continued to advance alongthe Moores Law curve. Every 18 months, memory and processing power havedoubled, but disk drive technology has not. Spinning drives continue to spin at thesame rate. This has caused a bottleneck in the I/O stack whereby the server and theapplication have capacity to process more I/O than the disk drives can deliver. This isreferred to as the I/O gap, as shown in Figure 2.

Figure 2: I/O gap between the processor and storage sub-systemsFlash drives in the storage system have helped to close this gap, and EMC is a verysuccessful industry leader in Flash drives. Flash is a silicon technology, notmechanical, and therefore can enjoy the same Moores Law curve.

Flash technology itself can be used in different ways in the storage environment.Figure 3 shows a comparison of different storage technologies based on the I/O persecond (IOPS) per gigabyte (GB) of storage that they offer.


11/31


Figure 3: Comparison of storage technologiesMechanical spinning drives provide a great dollar-per-gigabyte economic value tocold datasets, but they do not provide the best performance. Putting Flash drives inthe array provides an order of magnitude better performance. Putting Flash in theserver on a PCIe card can accelerate performance by even another order of magnitudeover Flash drives.

FAST technology on EMC storage arrays can help place the application data in theright storage tier based on the frequency with which data is being accessed. VFCacheextends FAST technology from the storage array into the server by identifying the mostfrequently accessed data and promoting it into a tier that is closest to the application.

EMC VFCache is a hardware and software server caching solution that dramaticallyimproves your application response time and delivers more IOPS. It intelligentlydetermines, through a fully automated tiering (FAST) algorithm, which data is thehottest data and would benefit by sitting in the server on PCIe Flash and closer tothe application. This avoids the latencies associated with I/O accesses over thenetwork through to the storage array. Once enough data from the application workingset has been promoted into VFCache, future accesses to the data will be at very lowlatencies. This results in an increase of performance by up to 300 percent and adecrease in latency by as much as 50 percent in certain applications.

Since the processing power required for an applications most frequently referenced

data is offloaded from the back-end storage to the PCIe card, the storage array canallocate greater processing power to other applications. While one application isaccelerated, the performance of other applications is maintained or even slightlyaccelerated.

EMC VFCache is EMCs newest intelligent software technology which extends EMCFAST into the server. When coupled with FAST, VFCache creates the most efficientand intelligent I/O path from the application to the data store. With both


12/31


technologies, EMC provides an end-to-end tiering solution to optimize applicationcapacity and performance from the server to the storage. As a result of the VFCacheintelligence, a copy of the hottest data automatically resides on the PCIe card in theserver for maximum speed. As the data slowly ages and cools, this copy is discardedand FAST automatically moves the data to the appropriate tier of the storage array from Flash drives to FC/SAS drives and SATA/NL-SAS drives over time.

VFCache ensures the protection of data by making sure that all changes to the datacontinue to persist down at the storage array, and uses the high availability and end-to-end data integrity check that a networked storage array provides. Figure 4 shows aVFCache deployment in a typical environment.

Figure 4: Typical EMC VFCache deploymentVFCache is designed to follow these basic principles: Performance: Reduce latency and increase throughput to dramatically improve

application performance

Intelligence: Add another tier of intelligence by extending FAST into the server Protection: Deliver performance with protection by using the high availability and

disaster recovery features of EMC networked storage


13/31


Business benefitsVFCache provides the following business benefits:

Because of the way VFCache works, a portion of I/O processing is offloaded fromthe storage array to the server where VFCache is installed. As VFCache is installedon more servers in the environment, more I/O processing is offloaded from the

storage array to the servers. The result is a highly scalable I/O processing storagemodela storage environment with higher performance capability.

As VFCache helps in offloading workload from the storage array, the disk drivesmay become less busy and can be reclaimed and used for other applications.

Note This should be done only after carefully studying the workload patternsand current utilization of disk drives.

VFCache increases the performance and reduces the response time ofapplications. For some businesses, this translates into an ability to do fastertransactions or searches, and more of them.

For example, a financial trading company may be limited in the number oftransactions it can process because of the number of IOPS that the storageenvironment can provide. VFCache increases throughout to allow for more trades,thereby generating more revenue for the company.

As another example, visitors to an e-commerce website may experience delaysbecause of the speed at which data can be read from the back-end storage. Withreduced latencies from VFCache, searches will be faster and web pages will loadin less time, which in turn improves the user experience of the site.

Typical customer environments might have multiple applications accessing thesame storage system in the back end. If some of these applications are more

important than others, you want to get the best performance for theseapplications while making sure that the other non-critical applications continue toget good enough performance.

Since VFCache is installed in the server instead of the storage, it provides thisflexibility. With multiple applications accessing the same storage, VFCacheimproves the performance of the application on the server where it is installed,while other applications on other servers continue to get good performance fromthe storage system. In fact, they might get a small performance boost becausepart of the back-end storage systems workload gets offloaded to VFCache, andthe storage system has more processing power available for these applications.

VFCache also provides you with the capability to configure VFCache at the servervolume level. If there are certain volumes, like application logs, which do notneed to be accelerated by VFCache, those specific devices can be excluded fromthe list of VFCache-accelerated volumes.

In a virtual environment, VFCache provides the flexibility to choose the virtualmachines and their source volumes that you want to accelerate using VFCache.


14/31


VFCache is a server-based cache and therefore completely infrastructure agnostic.It does not require any changes to the application above it, or the storage systemsbelow it. Introducing VFCache in a storage environment does not require you tomake any changes to the application or storage system layouts.

Since VFCache is a caching solution and not a storage solution, you do not have tomove the data. Therefore data is not at risk of being inaccessible if the server or

the PCIe card fails.

VFCache does not require any significant memory or CPU footprint, as all Flashand wear-level management is done on the PCIe card and does not use serverresources. There is no significant overhead from using VFCache on the serverresources, unlike other PCIe solutions.

Split-card mode in VFCache allows you to use part of the server Flash for cacheand the other part as DAS for temporary data.


15/31


Implementation detailsThis section of the white paper provides details about how I/O operations arehandled when VFCache is installed on the server. In any implementation of VFCache,the following components need to be installed in your environment:

Physical VFCache card VFCache vendor driver VFCache softwareIn a physical environment (non-virtualized), all the components need to be installedon the server where VFCache is being used to accelerate application performance. Formore information abou the installation of these components, see the EMC VFCacheInstallation and AdministrationGuide for Windows and Linux.

Figure 5 shows a simplified form of VFCache architecture. The server consists of twocomponents the green section on top shows the application layer, and the bluesection on the bottom shows the VFCache components in the server (SAN HBA shown

in the figure is not part of VFCache).VFCache hardware is inserted in a PCIe Gen2, x8 slot in the server. VFCache softwareis implemented as an I/O filter driver in the I/O path inside the operating system. Oneor more back-end storage LUNs or logical volume manager volumes are configured tobe accelerated by the VFCache. Every I/O from the application to an accelerated LUNor volume is intercepted by this filter driver. Further course of action for theapplication I/O depends on the particular scenario when the I/O is intercepted.

In the following examples, if the application I/O is for a source volume on whichVFCache has not been enabled, the VFCache driver is transparent to the applicationI/O, and it gets executed in exactly the same manner as if there was no VFCache

driver in the server I/O stack. You can assume that the application I/O is meant for asource volume which is being accelerated by VFCache in the following examples.

Read Hit exampleIn this example, you can assume that the VFCache has been running for some time,and the application working set has already been promoted into VFCache. Theapplication issues a read request, and the data is present in VFCache. This process iscalled Read Hit. The sequence of steps is detailed below Figure 5.


16/31


Figure 5: Read Hit example with VFCache1. The application issues a read request that is intercepted by the VFCache

driver.

2. Since the application working set has already been promoted into VFCache,the VFCache driver determines that the data being requested by theapplication already exists in the VFCache. The read request is thereforeforwarded to the PCIe VFCache card, rather than to the back-end storage.

3. Data is read from the VFCache and returned back to the application.This use case provides all the throughput and latency benefits to the application,since the read request is satisfied within the server itself rather than incurring all thelatencies of going over the network to the back-end storage.

Read Miss exampleIn this example, you can assume that the application issues a read request, and thatdata is not present in VFCache. This process is called Read Miss. The data might notbe in VFCache because the card has just been installed in the server, or theapplication working set has changed so that this data has not yet been referenced bythe application. The sequence of steps is detailed below Figure 6.


17/31


Figure 6: Read Miss example with VFCache1. The application issues a read request that is intercepted by the VFCache

driver.

2. The VFCache driver determines that the requested data is not in VFCache andforwards the request to the back-end storage.

3. The data is read from the back-end storage and returned back to theapplication.

4. Once the application read request is completed, the requested data is writtenby the VFCache driver to the VFCache card. This process is called Promotion.

This means that when the application reads the same data again in future, itwill be a Read Hit for VFCache, as explained previously.

If all cache pages in VFCache are already used, VFCache uses a least-recently-used(LRU) algorithm to write new data into itself. If needed, data that is least likely to beused in future is discarded out of VFCache first to create space for the new VFCachepromotions.

Write exampleIn this example, you can assume that the application has issued a write request. Thesequence of steps is detailed below Figure 7.


18/31


Figure 7: Write example with VFCache1. The application issues a write request that is intercepted by the VFCache

driver.

2. Since this is a write request, the VFCache driver passes this request to theback-end storage for completion.

3. Once the write operation is completed on the back-end storage, anacknowledgment for the write request is sent back to the application.

4. The data in the write request is written to the VFCache card. If the applicationis writing to a storage area that has already been promoted into VFCache, the

copy of that data in VFCache is overwritten. The application therefore will notreceive a stale or old version of data from the VFCache. VFCache algorithmsensure that if the application writes some data and then reads the same datalater on, the read requests will find the requested data in VFCache.

The process of promoting new data into VFCache as explained in the previous twoexamples is called Cache Warmup. Any cache needs to be warmed up with theapplication working set before the application starts seeing the performance benefits.When the working set of the application changes, the cache will automatically warmup with the new data over a period of time.

VMware implementationThe implementation of VFCache in a VMware environment is slightly different from theimplementation in a physical environment. In a virtualized environment, multiplevirtual machines on the same server may share the performance advantages of asingle VFCache card. This is shown in Figure 8.


19/31


Figure 8: VFCache implementation in a VMware environmentVFCache implementation in a VMware environment consists of the followingcomponents:

Physical VFCache card on the VMware ESX server VFCache vendor driver on the ESX server VFCache software in each virtual machine that needs to be accelerated using

VFCache

In a VMware environment, the VFCache software includes the VFCache driver, CLIpackage, and VFCache Agent. The VFCache software does not need to be installedon all the virtual machines in the server. Only those virtual machines that need tobe accelerated using VFCache need to have VFCache software installed.

VFCache VSI Plug-in for VFCache management in the VMware vCenter clientThis is usually the laptop that the administrator uses for connecting to the vCenterserver.

You have to create a datastore using the VFCache hardware on the ESX server. Oncethe VFCache datastore has been created, the rest of the setup can be managed usingthe VFCache VSI plug-in. In order for a virtual machine to use the VFCache datastore, avirtual disk (vDisk) for the virtual machines cache device must be created within theVFCache datastore. vDisks can be created either through the VFCache VSI plug-in ordirectly using the vSphere client. This virtual disk needs to be added to the virtualmachine.

The cache configuration and management steps from this point on are similar to thesteps that you would follow in a physical server environment. These can be done


20/31


using either the CLI on the virtual machine or the VSI plug-in on the vCenter client.More details on installation of VFCache in VMware environments can be found inVFCache Installation Guide for VMwareand VFCache VMware Plug-in AdministrationGuideavailable on Powerlink.

Depending on the cache size required on each virtual machine, an appropriate sizedcache vDisk can be created from the VFCache datastore and assigned to the virtual

machine. If you want to change the size of VFCache on a particular virtual machine,you need to do the following:

1. Shut down the virtual machine.2. Increase the size of the cache vDisk assigned to the virtual machine.3. Restart the virtual machine.VFCache is a local resource at the virtual machine level in the ESX server. This has thesame consequences as any other local resource on a server. For example, you cannotconfigure an automatic failover for a virtual machine that has VFCache. You cannotuse features like VMware vCenter Distributed Resource Scheduler (vCenter DRS)for

clusters or VMware vCenter Site Recovery Manager (vCenter SRM) for replication.You cannot use VFCache in a cluster that balances application workloads byautomatically performing vMotion from heavily used hosts to less-utilized hosts. Ifyou are planning to use vMotion functionality, you should:

1. Stop VFCache on the source virtual machine.2. Remove VFCache from the source virtual machine.3. Perform vMotion from the source virtual machine to the destination virtual

machine.

4. Restore caching on the destination virtual machine.Both RDM and VMFS volumes are supported with VFCache.

Split-card featureEMC VFCache has a unique "split-card" feature, which allows you to use part of theserver Flash as a cache and another part of the server Flash as DAS. When using theDAS portion of this feature, both read and write operations from the application aredone directly on the PCIe Flash capacity in the server.

The contents of the DAS portion do not persist to any storage array. Therefore, it ishighly recommended that you use the DAS portion only for temporary data, such asoperating system swap space and temp file space. This feature provides an option foryou to simultaneously use the card as a caching device and as a storage device fortemporary data.

When this functionality is used, the same Flash capacity and PCIe resources areshared between the cache and DAS portions. Therefore, the cache performance maybe less compared to when the PCIe card is being used solely as a caching solution.


21/31


VFCache managementVFCache does not require sophisticated management software. However, there is aCLI for management of the product. There is also an option of using a VSI plug-in forVFCache management in VMware environments.


22/31


Performance ConsiderationsVFCache is a write-through caching product rather than a Flash storage solution, sothere are certain things that need to be considered when evaluating VFCacheperformance.

Locality of referenceThe key to maximizing VFCache performance is the locality of reference in theapplication workload. Applications that reference a small area of storage with veryhigh frequency will benefit the most from using VFCache. Examples of this aredatabase indices and reference tables. If the locality of reference is low, theapplication may get less benefit after promoting a data chunk into VFCache. Very lowlocality will result in few or no promotions and thus no benefit.

Warm-up timeVFCache needs some warm-up time before it shows significant performanceimprovement. Warm-up time consists mostly of promotion operations into VFCache.

This happens when the VFCache has just been installed and is empty. This alsohappens when the working data set of the application has changed dramatically, andthe current VFCache data is no longer being referenced. During this phase, theVFCache read-hit rate is low, so the response time is more like that of the SANstorage. As the VFCache hit rate increases, the performance starts improving andstabilizes when a large part of the application working set has been promoted intoVFCache. In internal tests using a 1.2 TB Oracle database, the throughput increasedto more than twice the baseline values in 30 minutes when TPC-C-like workload wasused.

Among other things, warm-up time depends on the number and type of storage media

in the back-end SAN storage. For example, a setup of 80 SAS drives will have ashorter warm-up time than a setup with 20 SAS drives. Similarly, a setup with SAShard-disk drives (HDDs) in the back end will warm up faster than with NL-SAS HDDs inthe back end. This is because NL-SAS drives typically have a higher response timethan SAS drives. When you are designing application layouts, it is important toremember that there is a warm-up time before stable VFCache performance isreached.

In a demo or a Proof of Concept, the warm-up time can be speeded up by readingsequentially through the test area in 64 KB I/O size. Once the working set has beenpromoted, the benchmark test can be run again to compare the numbers with thebaseline numbers. CLI commands can be used to find out how many pages have been

promoted into VFCache. This gives you an idea of what percentage of the working sethas been promoted into the cache.

If you are comparing the performance against PCIe Flash DAS solutions, the initialperformance numbers of VFCache will be less because the cache needs to warm upbefore the stable performance numbers are shown. In the case of DAS solutions, allread and write operations happen from the PCIe Flash and there is no warm-up phase.


23/31


Therefore, initial performance numbers should not be compared between a cachingand a DAS solution.

Workload characteristicsThe final performance benefit that you can expect from VFCache depends on theapplication workload characteristics. EMC recommends that you do not enable

VFCache for storage volumes that do not have a suitable workload profile. Thisenables you to have more caching resources available for those volumes that are agood fit for VFCache. For example:

Read/write ratioVFCache provides read acceleration, so the higher the read/write ratio of theworkload, the more performance benefit you get.

Working set sizeYou should have an idea of the working set size of the application relative to thecache size. If the working set is smaller than the cache size, the whole working set

will get promoted into the cache and you will see very good performancenumbers. However, if the working set is much bigger than the cache, theperformance benefit will be less. The maximum performance benefit is for thoseworkloads where the same data is read multiple times or where the applicationreads the same data multiple times after writing it once.

Random versus sequential workloadsAn EMC storage array is very efficient in processing sequential workloads fromyour applications. The storage array uses its own cache and other mechanismslike prefetching to accomplish this. However, if there is any randomness in theworkload pattern, the performance is lower because of the seek times involved

with accessing data on mechanical drives. The storage array cache is also oflimited use in this case because different applications using the storage array willcompete for the same storage array cache resource. Flash technology does nothave any latency associated with seek times to access the data. VFCache willtherefore show maximum performance difference when the application workloadhas a high degree of random component.

ConcurrencyMechanical drives in the storage array have only one or two read/write heads,which means that only limited number I/Os can be processed at any one point intime from one disk. So when there are multiple threads in the application trying to

access data from the storage array, response times tend to go up because theI/Os need to wait in the queue before they are processed. However, storage andcaching devices using Flash technology typically have multiple channelsinternally that can process multiple I/Os at the same time. Therefore, VFCacheshows the maximum performance difference when the application workload has ahigh degree of concurrency. The application should request multiple I/Os at thesame time.


24/31


I/O SizeLarge I/O sizes tend to be bandwidth-driven and reduce the performance gapbetween Flash technology and non-Flash technologies. Applications with smallerI/O sizes (for example, 8 KB) show the maximum performance benefit when usingVFCache.

Throughput versus latencyThere are some applications that can push the storage environment to the limit toprovide as many IOPS as possible. Using VFCache in those application environmentswill show very high IOPS at very low response times. However, there are alsoapplications that do not require very high IOPS, but they require very low responsetimes.

You can see the benefit of using VFCache in these application environments. Eventhough the application issues relatively few I/Os, whenever the I/Os are issued, theywill be serviced with a very low response time. For example, a web application maynot have a lot of activity in general, but whenever a user issues a request, the

response will be very quick.

Other bottlenecks in the environmentVFCache helps improve throughput and reduce latencies for the applications.However, any drastic improvement in application throughput may expose newunderlying performance bottlenecks and/or anomalies. Addressing these may includeapplication tuning, such as increasing buffer cache sizes or other changes thatincrease concurrency. For example, in a typical customer deployment, a Microsoft SQLServer administrator should not enable VFCache on the log files. An inefficientstorage layout design of the log files may be exposed as a bottleneck when VFCacheimproves the throughput and latency of the SQL Server database.

Write performance dependent on back-end arrayVFCache provides acceleration to read I/Os from the application. Any write operationsthat the application issues still happens at the best speed that the back-end storagearray can provide. At a fixed read/write ratio from an application, this tends to limitthe net potential increase in read throughput. For example, if the storage array isoverloaded and is processing write operations at a very slow rate, VFCache will not beable to accelerate additional application reads.

Once VFCache has been enabled on a particular source volume, every I/O from theapplication needs to access the VFCache card, whether it is a read or a write

operation. In most cases, the processing capability of VFCache will be much greaterthan what the storage array can provide, therefore VFCache will not be a performancebottleneck in the data path. However, if a very large number of disks on the storagearray are dedicated to a single host application, and they are fully utilized in terms ofIOPS, the throughput that the storage array could provide without VFCache might bemore than what VFCache can process. In this scenario, VFCache may provide minimalperformance benefit to the application.


25/31


Usage guidelines and characteristicsThis section provides some of the usage guidelines and salient features of VFCache.

Since VFCache does not store any data that has not already been written onthe storage array, the application data is protected and is persisted on thestorage array if anything happens to VFCache on the server. However, the

cache would need to be warmed up again after the server starts up.

In a physical environment, you can enable or disable VFCache at the sourcevolume level or LUN level. In a virtual environment, the VFCache capacityneeds to be partitioned for individual virtual machines, as applicable. Thisallocated cache capacity inside the virtual machine can then be configured atvDisk-level granularity. The minimum size for the cache vDisk is 20 GB.

There is no hard limit on the maximum number of server volumes on whichVFCache can be enabled. However, if you enable it on a very large number ofvolumes, that may create resource starvation for those volumes that couldactually benefit from VFCache. EMC recommends that VFCache not be enabled

for those volumes that are least likely to gain any performance benefit fromVFCache. This allows other volumes that are a good fit for VFCache to get themaximum processing and cache capacity resources.

PowerPath optimizes the use of multiple data paths between supportedservers and storage systems, providing a performance boost by doing loadbalancing between the paths. VFCache improves the application performanceeven further by helping to move the most frequently accessed data closer tothe application by using PCIe Flash technology for write-through caching.

PowerPath and VFCache are complementary EMC products for scaling mission-critical applications in virtual and physical environments, including cloud

deployments. Additionally, since VFCache sits above the multipathingsoftware in the I/O stack, it can work with any multipathing solution on themarket. PowerPath and VFCache are purchased separately.

VFCache is complementary to FAST VP and FAST Cache features on the storagearray. However, it is not required to have FAST VP or FAST Cache on the storagearray to use VFCache.

VFCache only accelerates read operations from the application. The writeoperations will be limited by the speed with which the back-end array canprocess the writes.

Specifications The cache page size that is used internally in VFCache is 8 KB, but it will work

seamlessly with applications where the predominant I/O size is other than 8KB. The cache page size is fixed and is not customizable.

VFCache needs to be installed in PCIe Gen2, x8 slots in the server. It can beinstalled in x16 PCIe slots also, but only 8 channels will be used by VFCache.


26/31


Similarly, if it is installed in an x4 PCIe slot in the server, VFCache will performsub-optimally.

VFCache cards are available in 300 GB capacity. Only one VFCache card can be used per server. VFCache supports the following connection protocols between the server and

the storage array:

o 4 Gb/s Fibre Channelo 8 Gb/s Fibre Channel

VFCache is compliant with the Trade Agreements Act (TAA). The following mainrequirements are certified as not applicable to VFCache:

o FIPS 140-2o Common Criteriao Platform Hardeningo Research Remote Access

Constraints VFCache does not provide connectivity between the server and the SAN storage

array. You still need to use an HBA card to connect to the back-end storage arraywhere the data is eventually stored.

VFCache is not supported on blade servers. Blade servers require a customizedversion of the card. For the most current list of supported operating systems andservers, refer toE-Lab Interoperability Navigator.

VFCache is currently not supported in shared-disk environments or active/activeclusters. However, shared disk clusters in VMware environments are supportedsince VFCache is implemented at the virtual machine level rather than the ESXserver level.

By default, there is a maximum I/O size of 64 KB, which VFCache driver intercepts.Any I/O larger than 64 KB will not be intercepted by VFCache. Application withlarger I/O sizes are typically bandwidth sensitive and have sequential workloads,which would not benefit from a caching solution like VFCache.

Stale data Stale data because of storage array snapshots

If any operations modify the application data without the knowledge of the server,it is possible to have stale data in VFCache. For example, if a LUN snapshot weretaken on the array and later used to roll back changes on the source LUN, theserver would have no knowledge of any changes that had been done on the array,which would result in VFCache having stale data that had not been updated withthe contents from the snapshot. As a workaround in this case, you need tomanually stop and restart the VFCache software driver for the source volume.
https://elabnavigator.emc.com/https://elabnavigator.emc.com/https://elabnavigator.emc.com/https://elabnavigator.emc.com/


27/31


Note The whole cache device does not need to be stopped, only the sourcevolume on which the snapshot operations are being done needs to bestopped. When you restart the VFCache software driver on the sourcevolume, a new source ID is automatically generated for that sourcevolume, which invalidates the old VFCache contents for the source volumeand starts caching the new data from the snapshot. The application then

gets access to new data from the snapshot. Stale data in VMware environments

If you use VMware, you should be careful when the VMware snapshot feature isbeing used. VFCache metadata is kept in the virtual machine memory, therefore, itwill be a part of the virtual machine snapshot image when a virtual machinesnapshot is taken. This means that when this snapshot image is used to roll backthe virtual machine, the old metadata is restored and potentially causes datacorruption.

You must purge the VFCache before the virtual machine suspend and resumeoperations. This is handled using scripts that are automatically installed when the

VFCache Agent is installed in the virtual machine. These scripts are automaticallyinvoked when these virtual machine operations are run. In Windowsenvironments, you should take care to ensure that other programs or installationsin the virtual machine do not change the default suspend/resume scripts in sucha way that the VFCache scripts are not executed on those events. VFCache canalso be purged manually before suspend and resume operations in the virtualmachine, if needed.


28/31


Application use case and performanceVFCache helps you boost the performance of your latency and response-timesensitive applications typically applications such as database applications (likeOracle, SQL Server and DB2), OLTP applications, web applications, and financialtrading applications. VFCache is not suitable for more write-intensive or sequentialapplications such as data warehousing, streaming media, or Big Data applications.Use cases are shown in Figure 9.

Figure 9: VFCache Use CasesThe horizontal axis represents a typical read/write ratio of an application workload.The left side represents write-heavy applications such as backups. The right siderepresents read-heavy applications such as reporting tools.

The vertical axis represents the locality of reference or skew of the applicationsworkload. The lower end represents applications that have very low locality ofreference, and the top side represents applications where a majority of the I/Os go toa very small set of data.

You will achieve the greatest results with VFCache in high-read applications andapplications with a highly concentrated skew of data.


29/31


Test resultsEMC conducted application-specific tests with VFCache to determine potentialperformance benefits when this product is used. Here is a summary of the VFCachebenefits with a couple of applications:

SQL ServerWith a TPC-E like workload in a 750 GB Microsoft SQL Server 2008 R2environment, the number of transactions increased three times and the latencywas reduced by 87 percent when VFCache was introduced in the configuration.

Oracle With a TPC-C-like workload in a 1.2 TB Oracle 11gR2 physical environment,

the number of transactions increased three times and the latency was reducedby 50 percent when VFCache was introduced in the configuration. The testworkload had 70 percent reads and 30 percent writes.

In a VMware setup with 1.2 TB Oracle Database 11gR2 and TPC-C-likeworkload, the number of transactions increased by 80 percent when VFCachewas introduced in the configuration. The test workload had 70 percent readsand 30 percent writes.

For more information on application-specific guidelines and test results, refer to thelist of white papers provided in the References section.


30/31


ConclusionThere are multiple ways in which Flash technology can be used in a customerenvironment today, for example, Flash in the server or the storage array, Flash usedas a cache or a tier. The key, however, is the software that brings all of this together,using different technologies at the right place and time for the right price.

VFCache uses EMC FAST technology in the storage array and FAST in the server toprovide this benefit most appropriately, as simply and as easily as possible. VFCache dramatically accelerates the performance of read-intensive applications. VFCache software caches the most frequently used data on the server-based PCIe

card, which puts the data closer to the application. It extends FAST technologyinto the server by ensuring that the right data is placed in the right storage at theright time.

The intelligent caching algorithms in VFCache promote the most frequentlyreferenced data into the PCIe server Flash to provide the best possible

performance and latency to the application. VFCache provides you with the flexibility to use the same PCIe device as a caching

solution as well as a storage solution for temporary data.

VFCache suits many but not all customer environments, and it is important that youunderstand the application workload characteristics properly when choosing andusing VFCache.

VFCache protects data by using a write-through algorithm, which means that writespersist to the back-end storage array. While other vendors promise the performanceof PCIe Flash technology, EMC VFCache provides this performance with protection.


31/31

ReferencesThe following documents are available on Powerlink:

EMC VFCacheData sheet VFCache Installation and Administration Guide for Windows and Linux VFCache Release Notes for Windows and Linux VFCache Installation Guide for VMware VFCache Release Notes for VMware VFCache VMware Plug-in Administration Guide Considerations for Choosing SLC versus MLC Flash EMC VFCache Accelerates Oracle - EMC VFCache, EMC Symmetrix VMAX and

VMAXe, Oracle Database 11g

EMC VFCache Accelerates Virtualized Oracle - EMC VFCache, EMC Symmetrix VMAXand VMAXe, VMware vSphere, Oracle Database 11g

EMC VFCache Accelerates Oracle - EMC VFCache, EMC VNX, EMC FAST Suite,Oracle Database 11g

EMC VFCache Accelerates Microsoft SQL Server - EMC VFCache, EMC VNX,Microsoft SQL Server 2008

The following Demartek analyst report and video are available on EMC.com:

EMC VFCache Flash Caching Solution Evaluation

h10502 vfcache intro wp

Documents