cheap clustering ocfs2
TRANSCRIPT
-
8/2/2019 Cheap Clustering Ocfs2
1/27
Cheap Clustering with OCFS2
Mark Fasheh
OracleAugust 14, 2006
-
8/2/2019 Cheap Clustering Ocfs2
2/27
What is OCFS2
General purpose cluster file system
Shared disk model
Symmetric architecture
Almost POSIX compliant fcntl(2) locking
Shared writable mmap
Cluster stack Small, suitable only for a file system
-
8/2/2019 Cheap Clustering Ocfs2
3/27
Why use OCFS2?
Versus NFS
Fewer points of failure
Data consistency
OCFS2 nodes have direct disk access Higher performance
Widely distributed, supported
In Linux kernel Novell SLES9, SLES10
Oracle support for RAC customers
-
8/2/2019 Cheap Clustering Ocfs2
4/27
OCFS2 Uses
File Serving
FTP
NFS
Web serving (Apache)
Xen image migration
Oracle Database
-
8/2/2019 Cheap Clustering Ocfs2
5/27
Why do we need cheapclusters?
Shared disk hardware can be expensive
Fibre Channel as a rough example Switches: $3,000 - $20,000 Cards: $500 - $2,000 Cables, GBIC Hundreds of dollars Disk(s): The sky's the limit
Networks are getting faster and faster
Gigabit PCI card: $6
Some want to prototype larger systems
Performance not necessarily critical
-
8/2/2019 Cheap Clustering Ocfs2
6/27
Hardware
Cheap commodity hardware is easy tofind:
Refurbished from name brands (Dell, HP, IBM,
etc) Large hardware stores (Fry's Electronics, etc)
Online Ebay, Amazon, Newegg, etc
Impressive Performance Dual core CPUs running at 2GHz and up
Gigabit network
SATA, SATA II
-
8/2/2019 Cheap Clustering Ocfs2
7/27
Hardware Examples - CPU
2.66GHz, Dual Core w/MB: $129
Built in video, network
-
8/2/2019 Cheap Clustering Ocfs2
8/27
Hardware Examples - RAM
1GB DDR2: $70
-
8/2/2019 Cheap Clustering Ocfs2
9/27
Hardware Examples - Disk
100GB SATA: $50
-
8/2/2019 Cheap Clustering Ocfs2
10/27
Hardware Examples - Network
Gigabit network card: $6 Can direct connect rather than buy a switch,
buy two!
-
8/2/2019 Cheap Clustering Ocfs2
11/27
Hardware Examples - Case
400 Watt Case: $70
-
8/2/2019 Cheap Clustering Ocfs2
12/27
Hardware Examples - Total
Total hardware cost per node: $326
3 node cluster for less than $1,000!
One machine exports disk via network Dedicated gigabit network for the storage At $50 each, simple to buy an extra, dedicated disk Generally, this node cannot mount the shared disk
Spend slightly more for nicer hardware PCI-Express Gigabit: $30
Athlon X2 3800+, MB (SATA II, DDR2): $180
-
8/2/2019 Cheap Clustering Ocfs2
13/27
Shared Disk via iSCSI
SCSI over TCP/IP Can be routed
Support for authentication, many enterprisefeatures
iSCSI Enterprise Target (IETD)
iSCSI server
Can run on any disks, regular files
Kernel / User space components
Open iSCSI Initiator
iSCSI client
Kernel / User space components
-
8/2/2019 Cheap Clustering Ocfs2
14/27
Trivial ISCSI Target Config.
Name the target
iqn.YYYY-MM.com.example:disk.name
Create Target stanza in /etc/ietd.conf
Lun definitions describe disks to export
fileio type for normal disks
Special nullio type for testing
Target iqn.2006-08.com.example:lab.exports
Lun 0 Path=/dev/sdX,Type=fileio
Lun 1 Sectors=10000,Type=nullio
-
8/2/2019 Cheap Clustering Ocfs2
15/27
Trivial ISCSI Initiator Config.
Recent releases have a DB driven config. Use iscsiadm program to manipulate
rm -f /var/db/iscsi/* to start fresh
3 steps Add discovery address Log into target When done, log out of target
$ iscsiadm -m discovery --type sendtargets portal examplehost
[cbb01c] 192.168.1.6:3260,1 iqn.2006-08.com.example:lab.exports
$ iscsiadm -m node --record cbb01c -login
$ iscsiadm -m node --record cbb01c -logout
-
8/2/2019 Cheap Clustering Ocfs2
16/27
Shared Disk via SLES10
Easiest option
No downloading all packages included
Very simple setup using YAST2 Simple to use, GUI configuration utilityText mode available
Supported by Novell/Suse
OCFS2 also integrated with Linux-HAsoftware
Demo on Wednesday
Visit Oracle booth for details
-
8/2/2019 Cheap Clustering Ocfs2
17/27
Shared Disk via AoE
ATA over Ethernet
Very simple standard 6 page spec!
Lightweight client Less CPU overhead than iSCSI
Very easy to set up auto configuration viaEthernet broadcast
Not routable, no authenticationTargets and clients must be on the same Ethernet
network
Disks addressed by shelf and slot #'s
-
8/2/2019 Cheap Clustering Ocfs2
18/27
AoE Target Configuration
Virtual Blade (vblade) softwareavailable for Linux, FreeBSD
Very small, user space daemon
Buffered I/O against a device or file Useful only for prototyping O_DIRECT patches available
Stock performance is not very high
Very simple command
vbladed
-
8/2/2019 Cheap Clustering Ocfs2
19/27
AoE Client Configuration
Single kernel module load required
Automatically finds blades
Optional load time option, aoe_iflist List of interfaces to listen on
Aoetools package
Programs to get AoE status, bind interfaces,
create devices, etc
-
8/2/2019 Cheap Clustering Ocfs2
20/27
OCFS2
1.2 tree
Shipped with SLES9/SLES10
RPMS for other distributions available online
Builds against many kernels Feature freeze, bug fix only
1.3 tree
Active development tree Included in Linux kernel
Bug fixes and features go to -mm first.
-
8/2/2019 Cheap Clustering Ocfs2
21/27
OCFS2 Tools
Standard set of file system utilities
mkfs.ocfs2, mount.ocfs2, fsck.ocfs2, etc
Cluster aware
o2cb to start/stop/configure cluster
Work with both OCFS2 trees
Ocfs2console GUI configuration utility
Can create entire cluster configuration
Can distribute configuration to all nodes
RPMS for non SLES distributions available
online
-
8/2/2019 Cheap Clustering Ocfs2
22/27
OCFS2 Configuration
Major goal for OCFS2 was simple config. /etc/ocfs2/cluster.conf
Single file, identical on all nodes
Only step before mounting is to start cluster Can configure to start at boot
$ /etc/init.d/o2cb online
Loading module "configfs": OK
Mounting configfs filesystem at /sys/kernel/config: OKLoading module "ocfs2_nodemanager": OK
Loading module "ocfs2_dlm": OK
Loading module "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Starting O2CB cluster ocfs2: OK
-
8/2/2019 Cheap Clustering Ocfs2
23/27
Sample cluster.conf
node:ip_port = 7777
ip_address = 192.168.1.7
number = 0
name = keevan
cluster = ocfs2
node:
ip_port = 7777
ip_address = 192.168.1.2
number = 1
name = opaka
cluster = ocfs2
cluster:
node_count = 2
name = ocfs2
-
8/2/2019 Cheap Clustering Ocfs2
24/27
OCFS2 Tuning - Heartbeat
Default heartbeat timeout tuned very lowfor our purposes
May result in node reboots for lower
performance clustersTimeout must be same on all nodes
Increase O2CB_HEARTBEAT_THRESHOLD valuein /etc/sysconfig/o2cb
OCFS2 Tools 1.2.3 release will add this to theconfiguration script.
SLES10 users can use Linux-HA instead
-
8/2/2019 Cheap Clustering Ocfs2
25/27
OCFS2 Tuning mkfs.ocfs2
OCFS2 uses cluster and block sizes Clusters for data, range from 4K-1M
Use -C option
Blocks for meta data, range from .5K-4K Use -b option
More meta data updates -> larger journal
-Jsize= to pick different size mkfs.ocfs2 -T filesystem-type
-Tmail option for meta data heavy workloads
-Tdatafiles for file systems with very large files
-
8/2/2019 Cheap Clustering Ocfs2
26/27
OCFS2 Tuning - Practices
No indexed directories yet Keep directory sizes small to medium
Reduce resource contention
Read only access is not a problemTry to keep writes local to a node
Each node has it's own directory Each node has it's own logfile
Spread things out by using multiple filesystems
Allows you to fine tune mkfs options
depending on file system target usage
-
8/2/2019 Cheap Clustering Ocfs2
27/27
References
http://oss.oracle.com/projects/ocfs2/ http://oss.oracle.com/projects/ocfs2-tools/
http://www.novell.com/linux/storage_foundation/
http://iscsitarget.sf.net/ http://www.open-iscsi.org/
http://aoetools.sf.net/
http://www.coraid.com/
http://www.frys-electronics-ads.com/
http://www.cdw.com/
http://oss.oracle.com/projects/ocfs2/http://iscsitarget.sf.net/http://aoetools.sf.net/http://aoetools.sf.net/http://iscsitarget.sf.net/http://oss.oracle.com/projects/ocfs2/