tr nimble vertica best practices guide on linux-v5 · best practices guide: nimble storage for hp...
TRANSCRIPT
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H P V ER T I C A D B 1
BEST PRACTICES GUIDE
Nimble Storage for HP Vertica Database on Oracle Linux & RHEL 6
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H P V ER T I C A D B 2
Document Revision
Table 1Table 1Table 1Table 1.
Date Revision Description
1/9/2012 1.0 Initial Draft
8/9/2013 1.1 Revised Draft
1/31/2014 1.2 Revised
3/12/2014 1.3 Revised iSCSI Setting
9/5/2014 1.4 Revised Nimble Version
11/17/2014 1.5 Updated iSCSI & Multipath
THIS TECHNICAL TIP IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN
TYPOGRAPHICAL ERRORS AND TECHNICAL INACCUURACIES. THE CONTENT IS PROVIDED AS IS,
WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND.
Nimble Storage: All rights reserved. Reproduction of this material in any manner whatsoever
without the express written permission of Nimble is strictly prohibited.
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H P V ER T I C A D B 3
Table of Contents
Introduction ................................................................................................................................................................................. 4
Audience ...................................................................................................................................................................................... 4
Scope ........................................................................................................................................................................................... 4
Nimble Storage Features .......................................................................................................................................................... 5
Nimble Recommended Settings for HP Vertica DB .............................................................................................................. 6
Creating Nimble Volumes for HP Vertica DB ......................................................................................................................... 9
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H P V ER T I C A D B 4
Introduction
The purpose of this technical white paper is to walk through the step-by-step for tuning Linux operating system for Vertica
database running on Nimble Storage.
Audience
This guide is intended for Vertica database solution architects, storage engineers, system administrators and IT
managers who analyze, design and maintain a robust database environment on Nimble Storage. It is assumed that the
reader has a working knowledge of iSCSI SAN network design, and basic Nimble Storage operations. Knowledge of
Oracle Linux and Red Hat operating system is also required.
Scope
During the design phase for a new Vertica database implementation, DBAs and Storage Administrators often times work
together to come up with the best storage needs. They have to consider many storage configuration options to facilitate
high performance and high availability. In order to protect data against failures of disk drives, host bus adapters (HBAs),
and switches, they need to consider using different RAID levels and multiple paths. When you have different RAID levels
come into play for performance, TCO tends to increase as well. For example, in order to sustain a certain number of
IOPS with low latency for an OLTP workload, DBAs would require a certain number of 15K disk drives with RAID 10. The
higher the number of required IOPS, the more 15K drives are needed. The reason is because mechanical disk drives
have seek times and transfer rate, therefore, you would need more of them to handle the required IOPS with acceptable
latency. This will increase the TCO tremendously over time. Not to mention that if the database is small in capacity but
the required IOPS is high, you would end up with a lot of wasted space in your SAN.
This white paper explains the Nimble technology and how it can lower the TCO of your Vertica environment and still
achieve the performance required. This paper also discusses the best practices for implementing Linux operating
system for Vertica databases on Nimble Storage.
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H P V ER T I C A D B 5
Nimble Storage Features
Cache Accelerated Sequential Layout (CASL™)
Nimble Storage arrays are the industry’s first flash-optimized storage designed from the ground up to maximize
efficiency. CASL accelerates applications by using flash as a read cache coupled with a write-optimized data
layout. It offers high performance and capacity savings, integrated data protection, and easy lifecycle
management.
Flash-Based Dynamic Cache
Accelerate access to application data by caching a copy of active “hot” data and metatdata in flash for reads.
Customers benefit from high read throughput and low latency.
Write-Optimized Data Layout
Data written by a host is first aggregated or coalesced, then written sequentially as a full stripe with checksum
and RAID parity information to a pool of disk; CASL’s sweeping process also consolidates freed up disk space
for future writes. Customers benefit from fast sub-millisecond writes and very efficient disk utilization
Inline Universal Compression
Compress all data inline before storing using an efficient variable-block compression algorithm. Store 30 to 75
percent more data with no added latency. Customers gain much more usable disk capacity with zero
performance impact.
Instantaneous Point-in-Time Snapshots
Take point-in-time copies, which do not require data to be copied on future changes (redirect-on-write). Fast
restores without copying data. Customers benefit from a single, simple storage solution for primary and
secondary data, frequent and instant backups, fast restores and significant capacity savings.
Efficient Integrated Replication
Maintain a copy of data on a secondary system by only replicating compressed changed data on a set schedule.
Reduce bandwidth costs for WAN replication and deploy a disaster recovery solution that is affordable and easy
to manage.
Zero-Copy Clones
Instantly create full functioning copies or clones of volumes. Customers get great space efficient and
performance on cloned volumes, making them ideal for test, development, and staging Oracle databases.
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H P V ER T I C A D B 6
Nimble Recommended Settings for HP Vertica DB
Nimble Array
• Nimble OS should be at least 2.1.4 on either a CS500 or CS700 series
Linux Operating System
• iSCSIiSCSIiSCSIiSCSI Timeout and Performance SettingsTimeout and Performance SettingsTimeout and Performance SettingsTimeout and Performance Settings
Understanding the meaning of these iSCSI timeouts allows administrators to set these timeouts appropriately. These iSCSI timeouts parameters in the /etc/iscsi/iscsid.conf file should be set as follow:
node.session.timeo.replacement_timeout = 120 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 10 node.session.nr_sessions = 4
node.session.cmds_max = 2048
node.session.queue_depth = 1024 = = = NOP= = = NOP= = = NOP= = = NOP----Out Interval/Timeout = = =Out Interval/Timeout = = =Out Interval/Timeout = = =Out Interval/Timeout = = = node.conn[0].timeo.noop_out_timeout = [ value ] iSCSI layer sends a NOP-Out request to each target. If a NOP-Out request times out (default - 10 seconds), the iSCSI layer responds by failing any running commands and instructing the SCSI layer to requeue those commands when possible. If dm-multipath is being used, the SCSI layer will fail those running commands and defer them to the multipath layer. The mulitpath layer then retries those commands on another path. If dm-multipath is not being used, those commands are retried five times (node.conn[0].timeo.noop_out_interval) before failing altogether. node.conn[0].timeo.noop_out_interval [ value ] Once set, the iSCSI layer will send a NOP-Out request to each target every [ interval value ] seconds. = = = SCSI Error Handler = = == = = SCSI Error Handler = = == = = SCSI Error Handler = = == = = SCSI Error Handler = = = If the SCSI Error Handler is running, running commands on a path will not be failed immediately when a NOP-Out request times out on that path. Instead, those commands will be failed after replacement_timeout seconds. node.session.timeo.replacement_timeout = [ value ] ImportantImportantImportantImportant: Controls how long the iSCSI layer should wait for a timed-out path/session to reestablish itself before failing any commands on it. The recommended setting of 12The recommended setting of 12The recommended setting of 12The recommended setting of 120 seconds above 0 seconds above 0 seconds above 0 seconds above allows ample time for controller allows ample time for controller allows ample time for controller allows ample time for controller failoverfailoverfailoverfailover. Default is 120 seconds.
NoteNoteNoteNote: If set to 120 seconds, IO will be queued for 2 minutes before it can resume. The “1 queue_if_no_path1 queue_if_no_path1 queue_if_no_path1 queue_if_no_path” option in /etc/multipath.conf sets iSCSI timers to immediately defer commands to the multipath layer. This setting prevents IO errors from propagating to the application; because of this, you can set replacement_timeout to 60-120 seconds.
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H P V ER T I C A D B 7
NoteNoteNoteNote: Nimble Storage strongly recommends using dm-multipath for all volumes.
• MultipathMultipathMultipathMultipath cccconfigurationsonfigurationsonfigurationsonfigurations
The multipath parameters in the /etc/multipath.conf file should be set as follow in order to sustain a failover.
Nimble recommends the use of aliases for mapped LUNs
defaults { user_friendly_names yes find_multipaths yes } devices { device { vendor "Nimble" product "Server" path_grouping_policy group_by_serial path_selector "round-robin 0" features "1 queue_if_no_path" path_checker tur rr_min_io_rq 10 rr_weight priorities failback immediate } } multipaths { multipath { wwid 20694551e4841f4386c9ce900dcc2bd34 alias vertica1 } }
• Disk IO SchedulerDisk IO SchedulerDisk IO SchedulerDisk IO Scheduler
IO Scheduler needs to be set at “noop”
To set IO Scheduler for all LUNs online, run the below command. NoteNoteNoteNote: multipath must be setup first before
running this command. Any additional LUNs added or server reboot will not automatically change to this
parameter. Run the same command again if new LUNs are added or a server reboot.
[root@mktg04 ~]# multipath -ll | grep sd | awk -F":" '{print $4}' | awk '{print $2}' | while read LUN; do echo
noop > /sys/block/${LUN}/queue/scheduler ; done
To set this parameter automatically, append the below syntax to /etc/grub.conf file under the kernel line.
elevator=noop
• CPU ScalingCPU ScalingCPU ScalingCPU Scaling GovernorGovernorGovernorGovernor
CPU Scaling Governor needs to be set at “performance”
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H P V ER T I C A D B 8
To set the CPU scaling governor, run the below command.
[root@mktg04 ~]# for a in $(ls -ld /sys/devices/system/cpu/cpu[0-9]* | awk '{print $NF}') ; do echo
performance > $a/cpufreq/scaling_governor ; done
NoteNoteNoteNote: The setting above is not persistence after a reboot; hence the command needs to be executed when the
server comes back online. To avoid running the command after a reboot, place the command in the
/etc/rc.local file.
• iSCSI iSCSI iSCSI iSCSI Data NetworkData NetworkData NetworkData Network
Nimble recommends using 10GbE iSCSI for all databases.
2 separate subnets
2 x 10GbE iSCSI NICs
Use jumbo frames (MTU 9000) for iSCSI networks
Example of MTU setting for eth1: DEVICE=eth1 HWADDR=00:25:B5:00:00:BE TYPE=Ethernet UUID=31bf296f-5d6a-4caf-8858-88887e883edc ONBOOT=yes NM_CONTROLLED=no BOOTPROTO=static IPADDR=172.18.127.134 NETMASK=255.255.255.0 MTU=9000 To change MTU on an already running interface: [root@bigdata1 ~]# ifconfig eth1 mtu 9000
• /etc/sysctl.conf /etc/sysctl.conf /etc/sysctl.conf /etc/sysctl.conf
net.core.wmem_max = 16780000
net.core.rmem_max = 16780000
net.ipv4.tcp_rmem = 10240 87380 16780000
net.ipv4.tcp_wmem = 10240 87380 16780000
Run sysctl –p command after editing the /etc/sysctl.conf file.
• max_sectors_kb max_sectors_kb max_sectors_kb max_sectors_kb
Change max_sectors_kb on all volumes to 1024 (default 512).
To change max_sectors_kb to 1024 for a single volume:
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H P V ER T I C A D B 9
[root@bigdata1 ~]# echo 1024 > /sys/block/sd?/queue/max_sectors_kb Change all volumes: multipath -ll | grep sd | awk -F":" '{print $4}' | awk '{print $2}' | while read LUN do echo 1024 > /sys/block/${LUN}/queue/max_sectors_kb done
NoteNoteNoteNote: To make this change persistent after reboot, add the commands in /etc/rc.local file.
• VM dirty writeback and expireVM dirty writeback and expireVM dirty writeback and expireVM dirty writeback and expire
Change vm dirty writeback and expire to 100 (default 500 and 3000 respectively)
To change vm dirty writeback and expire: [root@bigdata1 ~]# echo 100 > /proc/sys/vm/dirty_writeback_centisecs [root@bigdata1 ~]# echo 100 > /proc/sys/vm/dirty_expire_centisecs
NoteNoteNoteNote: To make this change persistent after reboot, add the commands in /etc/rc.local file.
Creating Nimble Volumes for HP Vertica DB
Table 1Table 1Table 1Table 1:
Nimble Volume Nimble Volume Nimble Volume Nimble Volume
RoleRoleRoleRole
Recommended NumberRecommended NumberRecommended NumberRecommended Number of of of of
VolumesVolumesVolumesVolumes per DB Serverper DB Serverper DB Serverper DB Server
Recommended Number DB Recommended Number DB Recommended Number DB Recommended Number DB
Servers Cores per ArrayServers Cores per ArrayServers Cores per ArrayServers Cores per Array
Nimble Nimble Nimble Nimble Storage Storage Storage Storage
CachCachCachCaching Policying Policying Policying Policy
Volume Volume Volume Volume Block SizeBlock SizeBlock SizeBlock Size (Nimble Storage)(Nimble Storage)(Nimble Storage)(Nimble Storage)
EXT4 Data 4 – DB server with 8 cores or
less
8 – DB server with more than 8
cores
64 to 128 depending on
workload for a CS700.
Yes - Normal 32KB
EXT4 Journal Must equal number of EXT4 Data
Volumes
Yes - with
Aggressive
Caching
4KB
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H P V ER T I C A D B 1 0
EXT4 file system
• Use whole disk partition
• Create 1 EXT4 file system per Vertica storage location: One storage location will correspond to one Nimble
volume for EXT4 data and one Nimble volume for EXT4 journaling.
For exampleFor exampleFor exampleFor example: if 4 EXT4 file systems are needed for 4 Vertica storage locations, create a total of 8 Nimble
volumes.
NoNoNoNotetetete: Having multiple Vertica storage locations can allow query parallelism at the storage layer
and separation of Vertica temp and data locations for management and replication.
Creating Nimble Performance Policies
On the Nimble Management GUI, click on “Manage/Performance Policies” and click on the “New Performance
Policy” button. Enter the appropriate settings then click “OK”.
Change the “Vertica-Journal” performance policy to aggressive caching via the CLI.
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H P V ER T I C A D B 1 1
Login into the Nimble Array as “admin” Login into the Nimble Array as “admin” Login into the Nimble Array as “admin” Login into the Nimble Array as “admin” useruseruseruser [root@mktg03 ~]# ssh admin@<Nimble Array> / $ perfpolicy - -edit Vertica-Journal - -cache_policy aggressive
Example Example Example Example SetupSetupSetupSetup with with with with 1 EXT4 File System1 EXT4 File System1 EXT4 File System1 EXT4 File System::::
Create external journal deviceCreate external journal deviceCreate external journal deviceCreate external journal device
[root@mktg04 ~]# mkfs.ext4 -O journal_dev -L <journal label> /dev/mapper/<journal_vol>
Create EXTCreate EXTCreate EXTCreate EXT4444 file systemfile systemfile systemfile system
[root@mktg04 ~]# mkfs.ext4 -J device=LABEL=<journal label> /dev/mapper/<data-vol> -b 4096 -E
stride=8,stripe-width=8
Mount options in Mount options in Mount options in Mount options in /etc/fstab/etc/fstab/etc/fstab/etc/fstab filefilefilefile
/dev/mapper/<data-vol> /verticadb ext4 _netdev,noatime,nodiratime,discard,barrier=0 0 0
NoteNoteNoteNote: When using external journal device on any flavor of Linux, the filesystem with external journal
device may not mount after a server reboots. This is because when the server reboots, the disk device (i.e.
sd?) changes therefore causing the filesystem not having the same device for journaling. This is a bug in the
Linux mount command. The script below can be placed in the /etc/rc.local file so the correct journal device
can be used during the mount process.
#!/bin/bash
# # Script Name: mount_external_journal.sh
# # Description: This script is to mount an EXT4 file system with external journal device.
# External journal device is not persistent after reboot so this script
# will make sure the same external journal device is used after reboot. #
# Author: Nimble Storage #
# Date Written: 7/18/2013 #
# Revision: 1.0 #
# History: # Date: Who: What:
# 12/11/2013 T.D. Bug - changed for i statement # 9/12/2014 T.D. Changed to work without LVM
# #
#
############# ## M A I N ##
#############
echo echo '******************************************'
echo '******************************************' echo '*** Nimble Storage Copyright Program ***'
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H P V ER T I C A D B 1 2
echo '*** Authorized Use Only ***'
echo '******************************************' echo '******************************************'
echo
#
# Make sure we're running as root #
OS=`uname`
case ${OS} in SunOS) if [ `/usr/xpg4/bin/id -u` -ne 0 ] ; then
echo 1>&2 echo 1>&2
echo "`basename $0` - ERROR - Not executing as root." 1>&2
echo " - Processing terminated." 1>&2 echo 1>&2
exit 1 fi;;
*) if [ `/usr/bin/id -u` -ne 0 ] ; then echo 1>&2
echo 1>&2 echo "`basename $0` - ERROR - Not executing as root." 1>&2
echo " - Processing terminated." 1>&2 echo 1>&2
exit 1 fi;;
esac
for a in $(blkid | grep LABEL | grep "mapper" | awk -F"UUID=" '{print $NF}' | awk '{print $1}' | sort | uniq | sed 's/\"//g' | grep "^[0-9a-z]")
do for i in $(echo $a)
do # Get dm devices
journalmapper=$(blkid | grep $i | grep "mapper" | grep "LABEL" | awk '{print $1}' | sed 's/\://')
dev=$(blkid | grep $i | grep EXT_JOURNAL | grep "mapper" | awk '{print $1}' | sed 's/\://')
done
# Get mountpoint & device device=$(grep $dev /etc/fstab | awk '{print $1}')
mp=$(grep $dev /etc/fstab | awk '{print $2}')
echo "=============================================================" echo "Running e2fsck on device $device..."
echo "=============================================================" echo
e2fsck -f -p $device echo "============================================================="
echo "Running tune2fs on device $device..." echo "============================================================="
echo tune2fs -f -O ^has_journal $device
tune2fs -J device=$journalmapper $device echo "============================================================="
echo "Mounting device $device..." echo "============================================================="
echo mount -t ext4 -o _netdev,noatime,nodiratime,discard,barrier=0 -O journal_dev=$journalmapper
$device $mp echo
done
B E S T P R A C T I C E S G U I D E : N IMB LE S TOR AGE F OR H P V ER T I C A D B 1 3
Nimble Storage, Inc.
211 River Oaks Parkway, San Jose, CA 95134
Tel: 877-364-6253) | www.nimblestorage.com | [email protected]
© 2014 Nimble Storage, Inc. Nimble Storage, InfoSight, SmartStack, NimbleConnect, and CASL are trademarks or registered trademarks of Nimble Storage, Inc. All other trademarks are the property of their respective owners. BPG-Vertica-1114