![Page 1: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/1.jpg)
SQL Server, Storage and You - Part III: Solid State Storage
![Page 2: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/2.jpg)
Contact Information
• Wesley Brown• [email protected]• Twitter @WesBrownSQL• Blog http://www.sqlserverio.com
![Page 3: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/3.jpg)
Today’s Topic Covers…• NAND Flash Structure• MLC and SLC Compared• NAND Flash Read Properties• NAND Flash Write Properties• Wear-Leveling• Garbage Collection• Write Amplification• TRIM• Error Detection and Correction• Reliability• Form Factor• Performance Characteristics• Determining What’s Right for You• Not All SSD’s Are Created Equal
![Page 4: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/4.jpg)
Types Of Flash
• Two Main Flavors NAND And NOR• NOR
– Operates like RAM.– NOR is parallel at the cell level.– NOR reads slightly faster than NAND.– Can execute directly from NOR without copy to RAM.
• NAND– NAND operates like a block device a.k.a. hard disk.– NAND is serial at the cell level.– NAND writes significantly faster than NOR.– NAND erases much faster than NOR--4 ms vs. 5 s.
![Page 5: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/5.jpg)
Structure of NAND• Serial array of transistors.
– Each transistor holds 1 bit(or more).• Arrays grouped into pages.
– 4096 bytes in size.– Contains “spare” area for ECC and other ops.
• Pages grouped into Blocks– 64 to 128 pages.– Smallest erasable unit.
• Pages grouped into chip– As big as 16 Gigabytes.
• Chips grouped on to devices.– Usually in a parallel arrangement.
![Page 6: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/6.jpg)
NAND Flash Structure. Gates, Cells, Pages and Strings.
![Page 7: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/7.jpg)
MLC vs. SLC, FIGHT!• MLC (Multi-Level Cell)
– Higher capacity (two bits per cell).– Low P\E cycle count 3k~ 10K~.– Cheaper per Gigabyte.– High ECC needs.
• SLC (Single-Level Cell)– Fast read speed
• 25ns vs. 50ns
– Fast Write Speed• 220ns vs. 900ns
– High P\E cycle count 100k~ to 300k~– Tend to be conservative numbers.
– Minimal ECC requirements• 1 bit per 512 bytes vs. 12~ bits per.
– Expensive• Up to 5x the cost of MLC.
![Page 8: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/8.jpg)
Reading NAND Flash
• It isn’t RAM.– Slower access times.
• 1~ ns vs. 50~ ns.• No write in place.
• It isn’t a hard disk.– Much faster access times.
• Nanoseconds vs. Milliseconds
– No moving parts.
![Page 9: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/9.jpg)
Writing to NAND
• Program Erase Cycle– Erased state all bits are 1.– Programmed bits are 0.– Programmed pages at a time.
• One pass programming.
– Erased block at a time(128 pages).• Must erase entire block to program a single page
again.
– Finite life cycle, 10k~ MLC 100k~ SLC.• Once failed to erase may still be readable.
![Page 10: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/10.jpg)
Data written in pages and erased in blocks. Blocks are becoming larger as NAND Flash die sizes shrink.
![Page 11: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/11.jpg)
Feeding And Care of NAND
• Wear-Leveling– Spreads writes across blocks.– Ideally, write to every block before erasing any.– Data grouped into two patterns.
• Static, written once and read many times.• Dynamic, written often read infrequently.
– If you only Wear-Level data in motion you burn out the page quickly.
– If you Wear-Level static data you are incurring extra I/O
![Page 12: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/12.jpg)
Keeping Things Fast
• Background Garbage Collection– Defers P/E cycle.– Pages marked as dirty, erased later.– Requires spare area.– Incurs additional I/O.– Can be put under pressure by frequent small
writes.
![Page 13: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/13.jpg)
No Free Lunches
• Write Amplification– Ripples in a pond.– Device moves blocks around.– Incoming I/O greater than Device has.– Every write causes additional writes.
• Small writes can be a real problem.• OLTP workloads are a good example.• TRIM can help.
![Page 14: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/14.jpg)
Initial Write of 4 pages to a single erasable block.
![Page 15: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/15.jpg)
Four new pages and four replacement pages written. Original pages are now marked invalid.
![Page 16: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/16.jpg)
Garbage collection comes along and moves all valid pages to a new block and erases the other block.
![Page 17: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/17.jpg)
Keeping Things Fast
• TRIM– Supported out of the box on Windows 7, Windows
2008 R2. • Some manufacturers are shipping a TRIM service that
works with their driver
– Acts like spare area for garbage collection.– OS and file system tell drive block is empty.– Filling file system defeats TRIM.– File fragmentation can hurt TRIM.
• Grow your files manually!• Don’t run disk defrag!
![Page 18: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/18.jpg)
Detecting Errors and Correcting Them
Many things cause errors on Flash!• Write Disturb
– Data Cells NOT being written to are corrupted.• Fixed with normal erase.
• Read Disturb– Repeated reads on same page effects other pages on block.
• Fixed with normal erase.
• Charge Loss/Gain– Transistors may gain or lose charge over time.
• Flash devices at rest or rarely accessed data.• Fixed with normal erase.
All of these issues are generally dealt with very well using standard ECC techniques.
![Page 19: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/19.jpg)
As cells are programmed other cells may experience voltage change.
![Page 20: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/20.jpg)
As cells are read other cells in same block can suffer voltage change.
![Page 21: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/21.jpg)
If flash is at rest or rarely read cells can suffer charge loss.
![Page 22: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/22.jpg)
Pure Speed• Not all drives are benchmarked the same.• Short-stroking
– Only using a small portion of the drive.– Allows for lots of spare capacity via TRIM.
• Huge queue depths.– Increases latency.– Can be unrealistic.
• Odd block transfer sizes.– Random IO testing.
• Some use 512 byte while others use 4k.
– Sequential IO testing.• Most use 128k.• Some use 64k to better fit into large buffers.• Some use 1mb and high queue depths.
![Page 23: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/23.jpg)
How Fast Is It Again?
• Read the numbers carefully.– Random IO bench usually 4k.
• SQL Server works on 8k.
– Sequential IO bench usually 128k.• SQL Server works on 64k to 128mb
– Queue depths set high.• SQL Server usually configured for low Queue
depth.
![Page 24: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/24.jpg)
Is It Reliable Enough?
• SLC is ready “Out of the box.”– Requires much less infrastructure on disk to
support robust write environments.
• MLC needs some help.– Requires lots of spare area and smarter
controllers to handle extra ECC.– eMLC has all management functions built onto the chip.
• Both configured similarly.– RAID of chips.– TRIM, GC and Wear-Leveling
![Page 25: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/25.jpg)
He’s Dead Jim.
• Longevity between devices can be huge.• Consumer grade drives are consumable.
– Aren’t rated for full drive writes.• Desktop drives usually tested on a fraction of drive
capacity!
– Aren’t rated for continuous writes.• It may say three year life span.
– Could be much shorter look at total writes.
![Page 26: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/26.jpg)
You Say SATA I Say SAS…• SAS is the king of your heavy workloads.• Command Queuing
– SAS supports up to 216 usually capped at 64.– SATA supports up to 32.
• Error recovery and detection.– SMART isn’t.– SCSI command set is better.
• Duplex– SAS is full duplex and dual ported per drive.– SATA is single duplex and single ported.
• Multi-path IO– Native to SAS at the drive level.– Available to SATA via expanders.
![Page 27: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/27.jpg)
The Shape Of Things.• Flash comes in lots of form factors.
• Standard 2.5” and 3.5” drives,• Fibre Attached
• Texas Memory System RAM-SAN 620• Violin Memory
• PCIe add-in cards.• Few “native” cards.• Fusion-io• Texas Memory System RAM-SAN 20• Bundled solutions.• LSI SSS6200• OCZ Z-Drive• OCZ Revodrive
• PCIe To Disk• 2.5” form factor and plugs• Skips SAS/SATA for direct PCIe lanes.
![Page 28: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/28.jpg)
Understand Your Workloads!
• You MUST understand your workloads.– Monitor virtual file stats
• http://sqlserverio.com/2011/02/08/gather-virtual-file-statistics-using-t-sql-tsql2sday-15/
– Track random vs. sequential– Track size of transfers
– Capture IO Patterns• http://sqlserverio.com/2010/06/15/fundamentals-of
-storage-systems-capturing-io-patterns/
– Benchmark!• http://sqlserverio.com/2010/06/15/fundamentals-of
-storage-testing-io-systems/
![Page 29: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/29.jpg)
I’m Not As Fast As I Use To Be• From new
– Best possible performance.– Drive will never be this fast again.
• Previous writes effect future reads.– Large sequential writes nice for GC.– Small random writes slow GC down.– Wait for GC to catch up when benching drive.
• Give the GC time to settle in going from small random to large sequential or vice versa.
• Steady state is what we are after.
• Performance over time slows.– Cells wear out.
• Causes multiple attempts to read or write• ECC saves you but the IO is still spent.
![Page 30: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/30.jpg)
It’s a Sony on the inside, trust me.
• Not all drives are equal.• Understand drives are tuned for workloads.
– Desktop drives don’t favor 100% random writes…– Enterprise drives are expected to get punished.
• Fix it with firmware.– Most drives will have edge cases.
• OCZ and Intel suffered poor performance after drive use over time.
• Be wary of updates that erase your drive.– Gives you a temporary performance boost.
![Page 31: SQL Server, Storage and You - Part III: Solid State Storage](https://reader035.vdocuments.us/reader035/viewer/2022081519/56649d215503460f949f5d08/html5/thumbnails/31.jpg)
Takeaways
• Flash read performance is great, sequential or random.
• Flash write performance is complicated, and can be a problem if you don’t manage it.
• Flash wears out over time. – Not nearly the issue it use to be, but you must understand
your write patterns.– Plan for over provisioning and TRIM support.
• It can have a huge impact on how much storage you actually buy.
– Flash can be error prone. • Be aware that writes and reads can cause data corruption.