cheap clustering ocfs2

Upload: kien-ha

Post on 05-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Cheap Clustering Ocfs2

    1/27

    Cheap Clustering with OCFS2

    Mark Fasheh

    OracleAugust 14, 2006

  • 8/2/2019 Cheap Clustering Ocfs2

    2/27

    What is OCFS2

    General purpose cluster file system

    Shared disk model

    Symmetric architecture

    Almost POSIX compliant fcntl(2) locking

    Shared writable mmap

    Cluster stack Small, suitable only for a file system

  • 8/2/2019 Cheap Clustering Ocfs2

    3/27

    Why use OCFS2?

    Versus NFS

    Fewer points of failure

    Data consistency

    OCFS2 nodes have direct disk access Higher performance

    Widely distributed, supported

    In Linux kernel Novell SLES9, SLES10

    Oracle support for RAC customers

  • 8/2/2019 Cheap Clustering Ocfs2

    4/27

    OCFS2 Uses

    File Serving

    FTP

    NFS

    Web serving (Apache)

    Xen image migration

    Oracle Database

  • 8/2/2019 Cheap Clustering Ocfs2

    5/27

    Why do we need cheapclusters?

    Shared disk hardware can be expensive

    Fibre Channel as a rough example Switches: $3,000 - $20,000 Cards: $500 - $2,000 Cables, GBIC Hundreds of dollars Disk(s): The sky's the limit

    Networks are getting faster and faster

    Gigabit PCI card: $6

    Some want to prototype larger systems

    Performance not necessarily critical

  • 8/2/2019 Cheap Clustering Ocfs2

    6/27

    Hardware

    Cheap commodity hardware is easy tofind:

    Refurbished from name brands (Dell, HP, IBM,

    etc) Large hardware stores (Fry's Electronics, etc)

    Online Ebay, Amazon, Newegg, etc

    Impressive Performance Dual core CPUs running at 2GHz and up

    Gigabit network

    SATA, SATA II

  • 8/2/2019 Cheap Clustering Ocfs2

    7/27

    Hardware Examples - CPU

    2.66GHz, Dual Core w/MB: $129

    Built in video, network

  • 8/2/2019 Cheap Clustering Ocfs2

    8/27

    Hardware Examples - RAM

    1GB DDR2: $70

  • 8/2/2019 Cheap Clustering Ocfs2

    9/27

    Hardware Examples - Disk

    100GB SATA: $50

  • 8/2/2019 Cheap Clustering Ocfs2

    10/27

    Hardware Examples - Network

    Gigabit network card: $6 Can direct connect rather than buy a switch,

    buy two!

  • 8/2/2019 Cheap Clustering Ocfs2

    11/27

    Hardware Examples - Case

    400 Watt Case: $70

  • 8/2/2019 Cheap Clustering Ocfs2

    12/27

    Hardware Examples - Total

    Total hardware cost per node: $326

    3 node cluster for less than $1,000!

    One machine exports disk via network Dedicated gigabit network for the storage At $50 each, simple to buy an extra, dedicated disk Generally, this node cannot mount the shared disk

    Spend slightly more for nicer hardware PCI-Express Gigabit: $30

    Athlon X2 3800+, MB (SATA II, DDR2): $180

  • 8/2/2019 Cheap Clustering Ocfs2

    13/27

    Shared Disk via iSCSI

    SCSI over TCP/IP Can be routed

    Support for authentication, many enterprisefeatures

    iSCSI Enterprise Target (IETD)

    iSCSI server

    Can run on any disks, regular files

    Kernel / User space components

    Open iSCSI Initiator

    iSCSI client

    Kernel / User space components

  • 8/2/2019 Cheap Clustering Ocfs2

    14/27

    Trivial ISCSI Target Config.

    Name the target

    iqn.YYYY-MM.com.example:disk.name

    Create Target stanza in /etc/ietd.conf

    Lun definitions describe disks to export

    fileio type for normal disks

    Special nullio type for testing

    Target iqn.2006-08.com.example:lab.exports

    Lun 0 Path=/dev/sdX,Type=fileio

    Lun 1 Sectors=10000,Type=nullio

  • 8/2/2019 Cheap Clustering Ocfs2

    15/27

    Trivial ISCSI Initiator Config.

    Recent releases have a DB driven config. Use iscsiadm program to manipulate

    rm -f /var/db/iscsi/* to start fresh

    3 steps Add discovery address Log into target When done, log out of target

    $ iscsiadm -m discovery --type sendtargets portal examplehost

    [cbb01c] 192.168.1.6:3260,1 iqn.2006-08.com.example:lab.exports

    $ iscsiadm -m node --record cbb01c -login

    $ iscsiadm -m node --record cbb01c -logout

  • 8/2/2019 Cheap Clustering Ocfs2

    16/27

    Shared Disk via SLES10

    Easiest option

    No downloading all packages included

    Very simple setup using YAST2 Simple to use, GUI configuration utilityText mode available

    Supported by Novell/Suse

    OCFS2 also integrated with Linux-HAsoftware

    Demo on Wednesday

    Visit Oracle booth for details

  • 8/2/2019 Cheap Clustering Ocfs2

    17/27

    Shared Disk via AoE

    ATA over Ethernet

    Very simple standard 6 page spec!

    Lightweight client Less CPU overhead than iSCSI

    Very easy to set up auto configuration viaEthernet broadcast

    Not routable, no authenticationTargets and clients must be on the same Ethernet

    network

    Disks addressed by shelf and slot #'s

  • 8/2/2019 Cheap Clustering Ocfs2

    18/27

    AoE Target Configuration

    Virtual Blade (vblade) softwareavailable for Linux, FreeBSD

    Very small, user space daemon

    Buffered I/O against a device or file Useful only for prototyping O_DIRECT patches available

    Stock performance is not very high

    Very simple command

    vbladed

  • 8/2/2019 Cheap Clustering Ocfs2

    19/27

    AoE Client Configuration

    Single kernel module load required

    Automatically finds blades

    Optional load time option, aoe_iflist List of interfaces to listen on

    Aoetools package

    Programs to get AoE status, bind interfaces,

    create devices, etc

  • 8/2/2019 Cheap Clustering Ocfs2

    20/27

    OCFS2

    1.2 tree

    Shipped with SLES9/SLES10

    RPMS for other distributions available online

    Builds against many kernels Feature freeze, bug fix only

    1.3 tree

    Active development tree Included in Linux kernel

    Bug fixes and features go to -mm first.

  • 8/2/2019 Cheap Clustering Ocfs2

    21/27

    OCFS2 Tools

    Standard set of file system utilities

    mkfs.ocfs2, mount.ocfs2, fsck.ocfs2, etc

    Cluster aware

    o2cb to start/stop/configure cluster

    Work with both OCFS2 trees

    Ocfs2console GUI configuration utility

    Can create entire cluster configuration

    Can distribute configuration to all nodes

    RPMS for non SLES distributions available

    online

  • 8/2/2019 Cheap Clustering Ocfs2

    22/27

    OCFS2 Configuration

    Major goal for OCFS2 was simple config. /etc/ocfs2/cluster.conf

    Single file, identical on all nodes

    Only step before mounting is to start cluster Can configure to start at boot

    $ /etc/init.d/o2cb online

    Loading module "configfs": OK

    Mounting configfs filesystem at /sys/kernel/config: OKLoading module "ocfs2_nodemanager": OK

    Loading module "ocfs2_dlm": OK

    Loading module "ocfs2_dlmfs": OK

    Mounting ocfs2_dlmfs filesystem at /dlm: OK

    Starting O2CB cluster ocfs2: OK

  • 8/2/2019 Cheap Clustering Ocfs2

    23/27

    Sample cluster.conf

    node:ip_port = 7777

    ip_address = 192.168.1.7

    number = 0

    name = keevan

    cluster = ocfs2

    node:

    ip_port = 7777

    ip_address = 192.168.1.2

    number = 1

    name = opaka

    cluster = ocfs2

    cluster:

    node_count = 2

    name = ocfs2

  • 8/2/2019 Cheap Clustering Ocfs2

    24/27

    OCFS2 Tuning - Heartbeat

    Default heartbeat timeout tuned very lowfor our purposes

    May result in node reboots for lower

    performance clustersTimeout must be same on all nodes

    Increase O2CB_HEARTBEAT_THRESHOLD valuein /etc/sysconfig/o2cb

    OCFS2 Tools 1.2.3 release will add this to theconfiguration script.

    SLES10 users can use Linux-HA instead

  • 8/2/2019 Cheap Clustering Ocfs2

    25/27

    OCFS2 Tuning mkfs.ocfs2

    OCFS2 uses cluster and block sizes Clusters for data, range from 4K-1M

    Use -C option

    Blocks for meta data, range from .5K-4K Use -b option

    More meta data updates -> larger journal

    -Jsize= to pick different size mkfs.ocfs2 -T filesystem-type

    -Tmail option for meta data heavy workloads

    -Tdatafiles for file systems with very large files

  • 8/2/2019 Cheap Clustering Ocfs2

    26/27

    OCFS2 Tuning - Practices

    No indexed directories yet Keep directory sizes small to medium

    Reduce resource contention

    Read only access is not a problemTry to keep writes local to a node

    Each node has it's own directory Each node has it's own logfile

    Spread things out by using multiple filesystems

    Allows you to fine tune mkfs options

    depending on file system target usage

  • 8/2/2019 Cheap Clustering Ocfs2

    27/27

    References

    http://oss.oracle.com/projects/ocfs2/ http://oss.oracle.com/projects/ocfs2-tools/

    http://www.novell.com/linux/storage_foundation/

    http://iscsitarget.sf.net/ http://www.open-iscsi.org/

    http://aoetools.sf.net/

    http://www.coraid.com/

    http://www.frys-electronics-ads.com/

    http://www.cdw.com/

    http://oss.oracle.com/projects/ocfs2/http://iscsitarget.sf.net/http://aoetools.sf.net/http://aoetools.sf.net/http://iscsitarget.sf.net/http://oss.oracle.com/projects/ocfs2/