crash course in cloud computing
DESCRIPTION
All Things Open 2014 - Day 1 Wednesday, October 22nd, 2014 Mark Hinkle Senior Director & Citrix Open Source Business Office for Citrix Cloud Crash Course in Cloud Computing Find more of Mark's talks here: http://www.slideshare.net/socializedsoftwareTRANSCRIPT
Mark HinkleSenior Director, Open Source Solutions
Citrix Inc.
@mrhinkle
All Things Open 2014
Crash Course in
Open Source Cloud Computing
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
ABOUT MEI Help Build Open Source Ecosystems
Open Source Experience
• Manage Citrix Open Source Business Office
• Apache CloudStack Committer and PMC Member
• Advisory boards Gluster and Xen Project
• Joined Citrix via Cloud.com acquisition July 2011
• Zenoss Core open source project to 100,000 users,
1.5 million downloads
• Former LinuxWorld Magazine Editor-in-Chief
• Open Management Consortium organizer
• Author - “Windows to Linux Business Desktop
Migration” – Thomson
• NetDirector Project - Open Source Configuration
Management
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
Slides Available on Slideshare:
http://www.slideshare.net/socializedsoftware
Creative Commons Attributions-ShareAlike 4.0 International
Share — copy and redistribute the material in any medium or formatAdapt — remix, transform, and build upon the materialfor any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
AGENDA
• Vetting Open Source Cloud Projects
• Virtualization
• Infrastructure-as-a-Service
• Platform-as-a-Service
• SDN
• Open Source for Amazon Web Services
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
VETTING OPEN SOURCE PROJECTSHow can you tell if they’re Legit
• Code Velocity
• Committers
• Committer Reputation
• User-driven or Vendor-Driven
Innovation
• User Activity
• Corporate Support*
• Reputation of Foundation*
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
OPEN SOURCE ISN’T A ZERO-SUM GAME
…the future of technological innovation is not stealing limited
resources away from one another, but creating new resources
— and new opportunities to create new resources — together in
a rich ecosystem.”
Allison Randal
Open Source Hacker
Former OSCON Program Chair
@allisonrandal
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
OPEN SOURCE ANALYSISVisualizing Community Activity
http://www.openhub.net http://activity.openstack.org
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
Infrastructure-as-a-Service (IaaS)
Orchestration
?
Platform-as-a-Service (PaaS)
? ?
Compute Storage Networking(Networking-as-a-Service)
OPEN SOURCE CLOUD STACK
Orchestration
Configuration Management
DevOpsToolchain
Monitoring
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
VIRTUALIZATIONCarving up compute resources
OPEN SOURCE
• Xen Project
• Citrix XenServer
• KVM
• VirtualBox
• OpenVZ
• LXC
• libcontainer
PROPRIETARY
• VMware
• Microsoft Hyper-V
• OracleVM (Based on Xen Project)
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
HYPERVISORS AND CONTAINERSDifferences in virtualization
Type 1 HypervisorsVMware, Xen Project, Hyper-V
Type 2 HypervisorsKVM, VirtualBox
ContainersLXC, libcontainer
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
THE PORTABILITY PROBLEMContainers compared to Hardware Virtualization
• Different file formats for virtual machines (VMware uses vmdk file format, Xen and Hyper-V use VHD, KVM uses Raw or QCOW2)
• Guest images may be “processor architecture” bound
• VMware and Xen can manage SCSI devices, but KVM cannot
• KVM and Xen can use virtio drivers but not VMware
• VMware uses a proprietary agent inside the guest OS (VMware tools) which does not work with Xen or KVM
• Yada, Yada, Yada
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
LINUX CONTAINERS“Lightweight” Linux Virtualization
• Lets your run a Linux system within
another Linux system
• A container is a group of processes on a
Linux box, put together the provide an
isolated environment
• From the inside, it looks like a VM
• Externally it looks like normal processes
• “chroot on steroids”
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
CONTINUOUS INTEGRATIONRebuild Applications on any Cloud and/or Virtualized Infrastructure
• Code – Application is stored in a repository (Subversion,Git)
• Build – Code is built (Jenkins)• Test – Unit tests are
automated (Jenkins)• Deploy – Deploy code to
server various ways
Code
Build
Test
Deploy
Thoughtworks Go – Open Source Continuous Deliver System
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
DOCKER CONTAINER PACKAGINGOpen source LXC Packaging Engine
Docker is an open-source project to easily
create lightweight, portable, self-sufficient
containers from any application. The same
container that a developer builds and tests
on a laptop can run at scale, in production,
on VMs, bare metal, public clouds and
more.
To learn more please visit: www.docker.io
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
WHAT IS DOCKERSystem for Managing and Deploying LXC Containers
• Compliment to LXC not a replacement
• Managed daemonized processes on Linux
using LXC libcontainer
• Create ability to re-use and manage similar
applications
• Content agnostic
• Hardware agnostic
• Easy to automate
• Integrated with other tools: Chef, OpenShift,
Puppet, VMware, etc.
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
DOCKER’S GROWING ECOSYSTEM
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
KUBERNETESContainer Cluster Management – Scheduler
Kubernetes builds on top of Docker to
construct a clustered container scheduling
service. Kubernetes enables users to ask
a cluster to run a set of containers. The
system will automatically pick worker
nodes to run those containers on, which
we think of more as "scheduling" than
"orchestration”
To learn more please visit: https://github.com/GoogleCloudPlatform/kubernetesGreek for Shipmaster
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
DOCKER RELATED PROJECTS
• Fig -Fast, isolated development environments
• Flynn - Next-generation application platform
• Panamax – Drag-and-Drop Docker Containerization
• Project Atomic – JEOS designed to run Docker containers
• SocketPlane – Docker Networking (coming soon)
• Weave – Docker Networking
• 13,000+ Docker-related repos on Github
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
$141 Billion Market Cap
$363 Billion Market Cap
$356 Billion Market Cap
PUBLIC CLOUD
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
MINIMUM VIABLE CLOUDInfrastructure-as-a-Service | IaaS |Compute Orchestration
Project Year Started License Virtualization Technologies
Apache CloudStack
2008 Apache (Bare Metal), Xenserver, KVM, LXC VMware Hyper-V
Eucalyptus 2006 GPL Xen, KVM, VMware (commercial version)
OpenNebula 2005 Apache Xen, KVM, VMware
OpenStack 2010 (Developed by NASA by Anso Labs previously)
Apache VMware ESX and ESXi, , Xen, XenServer, KVM, LXC, QEMU and Virtual Box
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
OPENSTACKThe Boy Band of the Open Source Cloud
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
OPENSTACK SHARED SERVICESSpan Compute, Storage and Networking
IDENTITY
SERVICE
IMAGE
SERVICE
TELEMETRY
SERVICE
ORCHESTRATION
SERVICE
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
EVEN MORE OPENSTACK PROJECTSSpan Compute, Storage and Networking
• CinderBlock Storage Service
• CeilometerMetering/Monitoring
• HeatOrchestration
• TroveDatabase Service
• IronicBare Metal (Ironic)
• MarconiQueue Service
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
OPENSTACK SOLUTION PROVIDERSIf you can’t do it yourself
“OpenStack is not a product. If you are building a large infrastructure, it’s more like a tool kit. It gives you a lot of technologies that do take a lot of effort to integrate.”
Chris Kemp, OpenStack Board Member and Co-FounderCEO of Piston Computing
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
CLOUD APISEverything (should) have an API in the Cloud
• Deltacloud(ruby)
• Daisein(java)
• Jclouds(java)
• Libcloud(python)
• Fog(ruby)
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
CLOUD STORAGEVirtualized, Distributed usually on Commodity Hardware
Project Description
Ceph Distributed file storage system developed by DreamHost ->
InkTank -> Red Hat (block, object, file)
GlusterFS Scale Out NAS system aggregating storage over Ethernet or
Infiniband (file)
OpenStack
Storage
Long-term object storage system (object)
Riak CS Riak CS is open source software designed to provide simple,
available, distributed cloud storage at any scale. Riak CS is S3-
API compatible and supports per-tenant reporting for billing and
metering use cases. (object)
Sheepdog Distributed storage for KVM hypervisors, distributed iSCSI
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
Project Description
Ansible Ansible's SSH-key based access allows contributors to the Fedora Project to assist in
automating infrastructure while having access limited appropriately. (Originally authored Func)
Capistrano Utility and framework for executing commands in parallel on multiple remote machines, via SSH.
It uses a simple DSL that allows you to define tasks, which may be applied to machines in
certain roles
RunDeck Rundeck is an open-source process automation and command orchestration tool with a web
console.
Func Func provides a two-way authenticated system for generically executing tasks, integrations with
puppet and cobbler.
MCollective The Marionette Collective AKA MCollective is a framework to build server orchestration or
parallel job execution systems.
Salt Execute arbitrary shell commands or choose from dozens of pre-built modules of common (or
complex) commands.
Scalr Provide scaling across multiple cloud computing platforms, integrates with Chef.
CLOUD AUTOMATION TOOLSOne to many tools for managing large numbers of devices
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
PLATFORM-AS-A-SERVICEAbstracted Cloud-Scale Run-Time Environments
Project Sponsors Languages/Frameworks
CloudFoundry VMware -> Pivotal -> CloudFoundry
Foundation
Spring for Java, Ruby for Rails and
Sinatra, node.js, Grails, Scala on
Lift and more via partners (e.g.
Python, PHP)
Cloudify Gigaspaces [Groovy for deployment recipes]
OpenShift Origin Red Hat Java, Ruby, PHP, Perl and Python
Apache Stratos WSO2 - >Apache Stratus PHP, Tomcat, MySQL “cartridges”
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
APACHE MESOSOne to many tools for managing large numbers of devices
Apache Mesos is a cluster manager that simplifies the
complexity of running applications on a shared pool of
servers. Largely supported by Twitter, used by LinkedIn,
AirBNB too.
Features
• Fault-tolerant replicated master using ZooKeeper
• Scalability to 10,000s of nodes
• Isolation between tasks with Linux Containers
• Multi-resource scheduling (memory and CPU aware)
• Java, Python and C++ APIs for developing new
parallel applications
• Web UI for viewing cluster state
To learn more please visit: http://mesos.apache.org/
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
SOFTWARE DEFINED
NETWORKING(SDN)Virtualization meets the networkDecoupling of the control and data planes of the network toimprove efficiency. Communication from a SDN controller via aprotocol to network devices both physical and virtual.
Automation
Dynamic Networks
Security
Heterogeneous Management
Abstractions allow for programmable networks.
Network can be changed quickly via a controller
Network offerings can match virtualization offerings for finer grained security in a highly volatile compute landscape.
Single control point for various devices.
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
Business Applications
Network Services
SDN Control Software
API API
Network DevicesNetwork DevicesNetwork Devices
Network DevicesNetwork DevicesNetwork Devices
ApplicationLayer
Control Layer
InfrastructureLayer
Control Data Plane Interface (e.g. OpenFlow)
SDN OVERVIEW
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
BENEFITS OF SDNNetwork Virtualization is the final frontier of Software Defined Datacenter
• Dynamically update networks• Automate network
functionality• “Program” security into the
network• Centrally apply policies to
network and services• Optimize networks
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
OPENFLOWVirtualization meets the network
OpenFlow enables networks to
evolve, by giving a remote
controller the power to modify
the behavior of network
devices, through a well-defined
"forwarding instruction set".
The growing OpenFlow
ecosystem now includes
routers, switches, virtual
switches, and access points
from a range of vendors.
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
Project Description
Floodlight The Floodlight Open SDN Controller is an enterprise-class, Apache-licensed, Java-based OpenFlow
Controller. It is supported by a community of developers including a number of engineers from Big Switch
Networks. - See more at: http://www.projectfloodlight.org/floodlight/#sthash.9IhA1Ih5.dpuf
Indigo Indigo is an open source project aimed at enabling support for OpenFlow on physical and hypervisor
switches. Big Switch has helped numerous companies OpenFlow enable their equipment, and we
provide firmware for a number of popular switches. Indigo is the basis of Switch Light by Big Switch
Networks. - See more at: http://www.projectfloodlight.org/indigo/#sthash.K7LiHcqc.dpuf
Lincx LINCX is a pure OpenFlow software switch written in Erlang. It runs within a separate domain under Xen
hypervisor using LING (erlangonxen.org).
Nox NOX is the original OpenFlow controller, and facilitates development of fast C++ controllers on Linux.
Open Daylight Linux Foundation Collaborative Project based on Cisco One Controller and plugins from numerous
vendors in development. E.g IBM DOVE
Open vSwitch Open vSwitch is a open source (ASL 2.0), multilayer virtual switch designed to enable massive network
automation through programmatic extension, while still supporting standard management interfaces and
protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag).
OPEN SOURCE SDNSoftware Defined Network Controllers and more
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
OPEN VSWITCH
Open vSwitch is a production quality,
multilayer virtual switch licensed under the
open source Apache 2.0 license. It is
designed to enable massive network
automation through programmatic extension,
while still supporting standard management
interfaces and protocols (e.g. NetFlow, sFlow,
SPAN, RSPAN, CLI, LACP, 802.1ag).
To learn more please visit our website: http://openvswitch.org/
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
Infrastructure-as-a-Service | IaaS | Orchestration
(OpenStack, Apache CloudStack, Eucalyptus)
Docker
Platform-as-a-Service
CloudFoundry, OpenShift, Gigaspaces
Mesos Kubernetes
Compute
(Containers,
KVM, Xen)
Storage
(Ceph, Gluster)
Networking
(OpenDaylight,
Contrail)
OPEN SOURCE CLOUD STACK
Orchestration-
Ansible/SaltStack/Scalr*
Configuration Management
(CFengine/Chef/Puppet)
DevOpsToolchain
Monitoring(logstash,graphite,)
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
ASGARD ASTYANAX EDDA
EUREKA PRIAM SIMIAN ARMY
38
http
://netflix.gith
ub
.com
NETFLIX AWS TOOLBAGTools developed by a super Amazon Web Services Power User
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
CONTACT MEHappy to Chat about Open Source, Cloud or Pittsburgh Sports
Professional: [email protected]: [email protected]
Phone: 919.228.8049
Professional: http://open.citrix.comPersonal: http://www.socializedsoftware.com
Twitter: @mrhinkle
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
APPENDIX A
Additional Links to related stuff
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
• Devops Toolchains Group
• Software Defined Networking: The New Norm for Networks (Whitepaper)
• DevOps Wikipedia Page
• NoSQL-Database.org – Ultimate Guide to the Non-Relational Universe
• Open Cloud Initiative
• NIST Cloud Computing Platform
• Open Virtualization Format Specs
• Clouderati Twitter Account
• Planet DevOps
• Nicira Whitepaper – It’s Time to Virtualize the Network
• Why Open vSwitch FAQ
• Stanford Seminar - Software-Defined Networking at the Crossroads
ADDITIONAL LINKS
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
• SDN, NFV, and open source: The Operator’s View
• Puppet Labs: Build a Toolbox for Continuous Delivery
ADDITIONAL LINKS (CONT’D)
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
APPENDIX B
Stuff I’d liked to have talked
about but didn’t have time
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
60 SECOND CLOUD DEFINITION
5 CHARACTERISTICS OF CLOUD
1. On-Demand Self-Service2. Broad Network Access3. Resource Pooling4. Rapid Elasticity5. Measured Service
User Cloud a.k.a.
SOFTWARE-AS-A-SERVICE
Developer Cloud a.k.a.
PLATFORM-AS-A-SERVICE
Systems Cloud a.k.a.
INFRASTRUCTURE-AS-A-
SERVICE
Just because Software Marketing Guys Think it’s the Internet
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
Vertical Scaling (Scale-Up) Allocate additional resources to VMs, requires a reboot, no need for distributed app logic, single-point of OS failure
Horizontal Scaling (Scale-Out) Application needs logic to work in distributed fashion (e.g. HA-Proxy and Apache Hadoop)
SCALE-UP SCALE OUTElasticity and the cloud
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
SOURCING CLOUD APPLIANCESPackaging Engines for VMs
Tool/Project What you can do with them
Bitnami BitNami provides free, ready to run environments for your favorite open source web applications and frameworks, including Drupal, Joomla!, Wordpress, PHP, Rails, Django and many more.
Boxgrinder BoxGrinder is a set of projects that help you grind out appliances for multiple virtualization and Cloud providers
Oz Command-line tool that has the ability to create images for common Linux distributions to run on KVM
SUSE Studio SUSE Studio supports building and deploying directly to cloud services such as Amazon EC2.
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
Packer is easy to use and automates the
creation of any type of machine image. It
embraces modern configuration
management by encouraging you to use
automated scripts to install and configure
the software within your Packer-made
images.
To learn more please visit: www.packer.io
Open source Automation for VMs
PACKER MULTIPLATFORM VM CREATION
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
Project Year Started Language License Client/Server
CFengine 1993 C Apache Yes
Chef 2009 Ruby Apache Chef Solo – No
Chef Server - Yes
Puppet 2004 Ruby GPL Yes & standalone
Salt 2011 Python Apache yes
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
48
CONFIGURATION MANAGEMENT
TOOLSTools with features for configuring cloud infrastructure
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
Project Type of Monitoring Collection Methods
Cacti / RRDTool Performance SNMP, syslog
Graphite Performance Agent
Nagios Availability SNMP,TCP, ICMP, IPMI,
syslog
Sensu Availability Agent
Zabbix Availability/ Performance and more SNMP, TCP/ICMP, IPMI,
Synthetic Transactions
Zenoss Availability, Performance, Event
Management
SNMP, ICMP, SSH, syslog,
WMI
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
49
CLOUD MONITORING TOOLSTools with features for monitoring cloud infrastructure
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
Project Installation Targets
Apache Provisionr
(incubating)
Can provision 10s to 1000s of machines on various clouds.
Cobbler Distributed virtual infrastructure using koan (kickstart of a network to PXE
boot VMs) for Red Hat, OpenSUSE Fedora, Debian, Ubuntu VMs
Crowbar (Bare metal provisioning)
JuJu Public Clouds - Amazon Web Services HP Cloud,
Private OpenStack clouds, Bare Metal via MAAS.
Salt Cloud Tool to provision “salted” VMs that can then be updated by a central server
via ZeroMQ
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
50
CLOUD PROVISIONING TOOLSPackaging Engines for VMs
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
NOSQL DATABASESHorizontally scalable unstructured data retrieval
Name Type Description
Apache Cassandra
Wide ColumnStore/Families
API: many » Query Method: MapReduce, Replicaton: , Written in: Java, Concurrency: eventually consistent , Misc: like "Big-Table on Amazon Dynamo alike", initiated by Facebook
CouchDB Document Store API: Memcached API+protocol (binary and ASCII) , most languages, Protocol: Memcached REST interface for cluster conf + management, Written in: C/C++ + Erlang (clustering), Replication: Peer to Peer, fully consistent, Misc: Transparent topology changes during operation, provides memcached-compatible caching buckets
HBase Wide ColumnStore/Families
API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication: HDFS Replication, Written in: Java
Hypertable Wide ColumnStore/Families
PI: Thrift (Java, PHP, Perl, Python, Ruby, etc.), Protocol: Thrift, Query Method: HQL, native Thrift API, Replication: HDFS Replication, Concurrency: MVCC, Consistency Model: Fully consistent Misc: High performance C++ implementation of Google's Bigtable.
MongoDB Document Store API: BSON, Protocol: C, Query Method: dynamic object-based language & MapReduce, Replication: Master Slave & Auto-Sharding, Written in: C++,Concurrency
Redis Key Value/ Tuple Store API: Tons of languages, Written in: C, Concurrency: in memory and saves asynchronous disk after a defined time. Append only mode available. Different kinds of fsync policies. Replication: Master / Slave, Misc: also lists, sets, sorted sets, hashes, queues.
Riak Key Value / Tuple Store API: JSON, Protocol: REST, Query Method: MapReduce term matching , Scaling: Multiple Masters; Written in: Erlang, Concurrency: eventually consistent (stronger then MVCC via Vector Clocks)
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
MAP REDUCEAlgorithm for Parallelized Data Set Processing
Problem Data
Master Node
WorkerNode 1
Worker Node 2
Worker Node 3
Solution Data
Map
Reduce
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
APACHE HADOOPApache Project for Parallelized Data Set Processing
Overview
• Handles large amounts of
data
• Stores data in native format
• Delivers linear scalability at
low cost
• Resilient in case of
infrastructure failures
• Transparent application
scalability
Features
• Handles large amounts of
data
• Stores data in native format
• Delivers linear scalability at
low cost
• Resilient in case of
infrastructure failures
• Transparent application
scalability
By Mark R. Hinkle@[email protected]
All Things Open 2014 - Open Source Cloud Computing
Hadoop Hadoop Common
HDFSDistributes & replicates data
across machines
MapReduceDistributes & monitors tasks
Hive Data warehouse that
provides SQL interface. Ad hoc projection of
data structure to unstructured
MapReduce
• Parallel programming• Handles large data blocks
Non-Relational DB
HBaseColumn-oriented
schema-less distributed DB modeled after Google’s BigTableRandom real time
read/write.
Scripting
PigPlatform for
manipulating and analyzing large data sets.
Scripting language for analysts.
MahoutMachine learning
libraries for recommendations ,
clustering, classifications and item sets.
Machine Learning
Ch
uck
wa
Zoo
kee
pe
rAPACHE HADOOP ECOSYSTEM