e-com notes (chapters 6-9).rtf

Upload: rajanityagi23

Post on 02-Jun-2018

237 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    1/130

    /1/2010

    E-business managementBy

    Rajani tyagi

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    2/130

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    3/130

    ContentsEBusiness Introduction........................................................................................................................... ......................... 2

    EBusiness Strategies......................................................................................................................................................15

    Integration of Applications.............................................................................................................................................2E!o""erce Infrastructure............................................................................................................................................ #

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    4/130

    $age 1

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    5/130

    E- Bu s i ness In trodu c tio n

    E-Business vs. E-commerce%&ile so"e use eco""erce and e'usiness interc&angea'ly( t&ey are distinct concepts.Electronic business ( co""only referred to as ) eBusiness ) or ) ebusiness )( "ay 'ede*ned as t&e application of infor"ation and co""unication tec&nologies +I!,- in support of all t&e acti ities of 'usiness. !o""erce constitutes t&e e c&ange of products and ser ices'et een 'usinesses( groups and indi iduals and can 'e seen as one of t&e essential acti itiesof any 'usiness. Electronic co""erce focuses on t&e use of I!, to ena'le t&e e ternalacti ities and relations&ips of t&e 'usiness it& indi iduals( groups and ot&er 'usinesses.

    E!o""erce Is a particular for" of eBusiness. Electronic 'usiness "et&ods ena'leco"panies to lin t&eir internal and e ternal data processing syste"s "ore e ciently and3e i'ly( to or "ore closely it& suppliers and partners( and to 'etter satisfy t&e needs ande pectations of t&eir custo"ers. !o"pared to e!o""erce( eBusiness is a "ore generic ter"'ecause it refers not only to infor"ation e c&anges related to 'uying and selling 'ut alsoser icing custo"ers and colla'orating it& 'usiness partners( distri'utors and suppliers.

    EBusiness enco"passes sop&isticated 'usinessto'usiness interactions and colla'orationacti ities at a le el of enterprise applications and 'usiness processes( ena'ling 'usiness

    partners to s&are indept& 'usiness intelligence( &ic& leads( in turn( to t&e "anage"ent and opti"i4ation of interenterprise processes suc& as supply c&ain "anage"ent. orespecifically( eBusiness ena'les co"panies to lin t&eir internal and e ternal processes"ore e ciently and 3e i'ly( or "ore closely it& suppliers and 'etter satisfy t&e needsand e pectations of t&eir custo"ers.

    In practice( e'usiness is "ore t&an just eco""erce. %&ile e'usiness refers to "ore

    strategic focus it& an e"p&asis on t&e functions t&at occur &en using electroniccapa'ilities( eco""erce is a su'set of an o erall e 'usiness strategy. Eco""erce see s toadd re enue strea"s using t&e %orld %ide %e' or t&e Internet to 'uild and en&ancerelations&ips it& clients and partners and to i"pro e ef*ciency using t&e E"pty 6esselstrategy. 7ften( e co""erce in ol es t&e application of no ledge "anage"ent syste"s.

    E'usiness in ol es 'usiness processes spanning t&e entire alue c&ain8 electronic purc&asing and supply c&ain "anage"ent( processing orders electronically( &andlingcusto"er ser ice( and cooperating it& 'usiness partners. Special tec&nical standards for e'usiness facilitate t&e e c&ange of data 'et een co"panies. E'usiness soft are solutions

    allo t&e integration of intra and inter *r" 'usiness processes. E'usiness can 'econducted using t&e %e'( t&e Internet( intranets( e tranets( or so"e co"'ination of t&ese.

    Basically( electronic co""erce +E!- is t&e process of 'uying( transferring( or e c&anging products( ser ices( and/or infor"ation ia co"puter net or s( including t&e internet. E!can also 'e 'ene*ted fro" "any perspecti e including 'usiness process( ser ice( learning(colla'orati e( co""unity. E! is often confused it& e'usiness.

    In eco""erce( infor"ation and co""unications tec&nology +I!,- is used in inter'usinessor interorgani4ational transactions +transactions 'et een and a"ong *r"s/organi4ations-and in 'usinesstoconsu"er transactions +transactions 'et een *r"s/organi4ations andindi iduals-.In e'usiness( on t&e ot&er &and( I!, is used to en&ance one9s 'usiness. It includes any

    process t&at a 'usiness organi4ation +eit&er a forpro*t( go ern"ental or nonprofit entity- conducts o er a co"puter"ediated net or .

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    6/130

    A "ore co"pre&ensi e de*nition of e'usiness is8 : The transformation of an organizationsprocesses to deliver additional customer value through the application of technologies,philosophies and computing paradigm of the new economy .;

    $age 2

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    7/130

    ,&ree pri"ary processes are en&anced in e'usiness8

    < Production processes ( &ic& include procure"ent( ordering and replenis&"ent of stoc s= processing of pay"ents= electronic lin s it& suppliers= and production control

    processes( a"ong ot&ers=< Customerfocused processes, &ic& include pro"otional and "ar eting efforts(

    selling o er t&e Internet( processing of custo"ers9 purc&ase orders and pay"ents( and

    custo"er support( a"ong ot&ers< Internal management processes ( &ic& include e"ployee ser ices( training(

    internal infor"ations&aring( ideoconferencing( and recruiting. Electronicapplications en&ance infor"ation flo 'et een production and sales forces toi"pro e sales force producti ity. %or group co""unications and electronic pu'lis&ingof internal 'usiness infor"ation are li e ise "ade "ore e cient.

    EBusiness goes far 'eyond eco""erce or 'uying and selling o er t&e Internet( and deepinto t&e processes and cultures of an enterprise. It is t&e po erful 'usiness en iron"ent t&at is created &en you connect critical 'usiness syste"s directly to custo"ers( e"ployees(

    endors( and 'usiness partners( using Intranets( E tranets( eco""erce tec&nologies(colla'orati e applications( and t&e %e'.

    E'usiness is a "ore strategic focus it& an e"p&asis on t&e functions t&at occur &en usingelectronic capa'ilities &ile Eco""erce is a su'set of an o erall e'usiness strategy. Eco""erce see s to add re enue strea"s using t&e %orld %ide %e' or t&e Internet to 'uildand en&ance relations&ips it& clients and partners and to i"pro e e ciency &ileElectronic 'usiness "et&ods ena'le co"panies to lin t&eir internal and e ternal data

    processing syste"s "ore e ciently and 3e i'ly( to or "ore closely it& suppliers and partners( and to 'etter satisfy t&e needs and e pectations of t&eir custo"ers.

    EBusiness is at t&e enterprise application le el and enco"passes sop&isticated '2'interaction and colla'oration acti ities. Enterprise Application Syste"s suc& as ER$( !R (S! for" an integral part of eBusiness strategy and focus.

    Critical Factors with respect of e-BusinessEBusiness supports 'usiness processes along t&e entire alue c&ain8 Electronic purc&asing+E$rocure"ent-( S! +Supply !&ain anage"ent-( $rocessing orders electronically(!usto"er Ser ice > !ooperation it& 'usiness partners.

    7ne of t&e o'jecti es of eBusiness is to pro ide sea"less connecti ity and integration'et een 'usiness processes and applications e ternal to an enterprise and t&eenterprise9s 'ac o ce applications suc&a as 'illing( orger processing( accounting(in entory and recei a'les( and ser ices focused to total supply c&ain "anage"ent and

    partners&ip including product de elop"ent( ful*ll"ent( and distri'ution. In t&is respect( eBusiness is "uc& "ore t&an e!o""erce.

    ,o succeed in eBusiness it is crucial to co"'ine tec&nological de elop"ents it& corporatestrategy t&at redi*nes a co"pany9s role in t&e digital econo"y &ile ta ing into account its arious sta e&olders. It is i"perati e to understand t&e issues( e aluate t&e options(and de elop tec&nology orientation plans. An eBusiness strategy &elps organi4ations identify

    t&eir eBusiness concerns( assess t&eir infor"ation needs( analy4e to &at degree e istingsyste"s ser e t&ese o'jecti es( pinpoint speci*c i"pro e"ents( deter"ine t&e de elop"ent stages of eBusiness solutions and attain concrete and "easura'le results. ,&us( it is clear t&at eBusiness solutions are not only a'out tec&nology.

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    8/130

    A classic e a"ple is SA$ syste"s integrations for any organi4ation. ,&is itself is ta en up asa project and e ecuted it& great attention to detail. A "inute logical error in interpretationof t&e *r"9s o'jecti es could result in t&e entire syste" 'eing re or ed fro" scratc&.

    $age ?

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    9/130

    EBusiness allo s for rede*nition of alue( co"petiti eness and t&e ery nature of transactions and it a@ects all areas of an organi4ation. It is crucial to co"'ine tec&nology and 'usiness strategy &ile ta ing into account arious sta e&olders

    An E'usiness Strategy &elps to

    < Identify e'usiness concerns < Assess

    info needs< Analy4e e isting syste"s< I"pro e"ents re uired in e isting syste"s< eter"ine t&e stages of de elop"ent of solutions < Attain concrete and "easura'le results.

    Characteristics of e-Business,o e"p&asi4e( eBusiness is not si"ply 'uying and selling 'ut enco"passes t&ee c&ange of "any inds of infor"ation( include online co""ercial transactions. EBusiness

    is a'out integrating e ternal co"pany processes it& an organi4ation9s internal 'usiness processes= as suc&( a ariety of core 'usiness processes could e ploit an e Businessinfrastructure.

    ,&ese include a"ong ot&ers8

    < !olla'orati e $roduct e elop"ent < !olla'orati e $lanning( Corecasting and Replenis&"ent < $rocure"ent and 7rder "anage"ent < 7perations and Dogistics

    Collaborative Product evelopment ,&is is one of t&e fastest gro ing tec&nologies in engineering it& so"e for" of solutions'eing i"ple"ented in a range of industries suc& as auto"oti e( aerospace( agricultural"ac&inery etc. It contri'utes to ards "a ing products in a s&ort ti"e span &ile"aintaining uality and reducing cost.

    It also aids in "a i"i4ing ti"eto"ar et 'enefits &ile "aintaining control o er product de elop"ent infor"ation. By integrating design and testing cycles of products it& t&ose of suppliers( a *r" can s&orten t&e co"plete cycle of its products. ,&is clearly( reduces t&e totalcost of t&e product cycle( > e en "ore i"portantly( it reduces t&e ti"e t&at is needed to'ring products to t&e "ar etplace. !olla'orati e product de elop"ent solutions offer ER$integration and S! .

    Collaborative Planning, Forecasting and !eplenishment ,&is is a process in &ic& anufacturers( istri'utors and Retailers or toget&er to plan(forecast and replenis& products. In eBusiness relations&ips colla'oration ta es t&e for" of s&aring infor"ation t&at i"pacts in entory le els and "erc&andise flo .

    Collaboration points" sales forecasts( in entory re uire"ents( "anufacturing and logistic lead ti"es( seasonal set sc&edules( ne /re"odel storage plans( pro"otional plans etc

    #oal" ,o get t&e partners to or toget&er to i"pro e lo er supply cycle ti"es( i"pro e custo"er ser ice( lo er in entory costs( i"pro e in entory le els and ac&ie e 'etter control of planning acti ities

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    10/130

    Procurement and $rder management Electronic procure"ent or E$rocure"ent can ac&ie e significant sa ings and ot&er 'ene*ts t&at i"pact t&e custo"er. ,o support procure"ent and order "anage"ent

    processes( co"panies use an integrated electronic ordering process and ot&er onlineresources to increase e ciency in purc&asing operations.

    $age

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    11/130

    Bene%ts" cost sa ings( 'etter custo"er ser ice 'y controlling t&e supply 'ase( negotiating e@ecti e 'uying preferences( and strea"lining t&e o erall procure"ent process.

    $perations & 'ogisticsDogistics is t&at part of t&e supply c&ain process t&at plans( i"ple"ents and controls t&ee@icient( e@ecti e flo and storage of goods( ser ices and related infor"ation fro" t&e point of origin to point of consu"ption in order to "eet custo"er re uire"ents. ,o "a e t&is&appen( transportation( distri'ution( are&ousing( purc&asing > order "anage"ent functions "ust or toget&er. Dogistics in t&e eBusiness era is all a'out !olla'oration t&es&aring of critical and ti"ely data on t&e "o e"ent of goods as t&ey 3o fro" ra "aterial(all t&e ay to t&e enduser.

    7perations and Dogistics processes are 'ased on open co""unication 'et een net or s of trading partners &ere integrated processes and tec&nology are essential for &ig&

    perfor"ance logistics operations. ,&ese solutions &elp "anage t&e logistics process 'et een'uyers and suppliers( &ile eli"inating costly discrepancies 'et een purc&ase order( salesorder and s&ipping infor"ation. By eradications t&ese ariances and inconsistenciesi"pro e"ents in t&e supply c&ain "ay result fro" t&e eli"ination of "i ed s&ip"ents and

    s&ip"ent discrepancies( and t&e reduction of in entory carrying costs for t&e custo"er. At t&e sa"e ti"e t&is increases custo"er satisfaction t&roug& i"pro ed deli ery relia'ility and i"pro ed e@iciencies in recei ing operations.

    Curt&er"ore( t&ere are critical ele"ents to e'usiness "odels as ell. ,&ey are as follo s8

    < A shared digital business infrastructure ( including digital production anddistri'ution tec&nologies +'road'and/ ireless net or s( content creation tec&nologiesand infor"ation "anage"ent syste"s-( &ic& ill allo 'usiness participants to createand utili4e net or econo"ies of scale and scope.

    < A sophisticated model for operations, including integrated alue c&ains'ot& supply c&ains and 'uy c&ains. < An ebusiness management model, consisting of 'usiness tea"s and/or partners&ips=< Polic(, regulator( and social s(stems i.e.( 'usiness policies consistent it&

    eco""erce la s( tele or ing/ irtual or ( distances learning( incenti e sc&e"es( a"ong ot&ers.

    < Ease of )utomated Processing A payer can no c&eaply and easily auto"ate t&e generation and processing of "ultiple pay"ents it& "ini"al e@ort.$re iously( t&e dependency upon 'an s to &andle "ost pay"ents and t&e lac of ac&eap( u'i uitous co""unications tec&nology "ade auto"ation of pay"ent

    processes e pensi e and di@icult to esta'lis&.< Immediac( of result $ay"ent i""ediacy occurs 'ecause auto"ation and t&e

    a'ility for t&e inter"ediate syste"s and pro iders to process pay"ents in realti"e.%it& t&e "ore "anual( paper'ased syste"s t&ere as al ays a ti"e delay due to t&ere uire"ent for &u"an inter ention in t&e process.

    < $penness and accessibilit( ,&e a aila'ility of c&eap co"puting andco""unications tec&nology and t&e appropriate soft are ena'les s"all enterprisesand indi iduals to access or pro ide a range of pay"ent ser ices t&at ere

    pre iously only a aila'le to large organi4ations ia dedicated net or s or t&etransactional processing units of 'an s.

    < 'oss of collateral information ,&e ne tec&nology dispenses it&( or alters(

    collateral information acco"panying transactions. ,&is infor"ation &as traditionally 'een part of t&e transaction( and &as 'een relied upon 'y t&e transacting parties to

    alidate indi idual pay"ents.< !ollateral infor"ation can 'e de*ned as infor"ation8

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    12/130

    < %&ic& is not essential to t&e "eaning and intent of a transaction=< %&ic& is typically incidental to t&e nature of t&e co""unications c&annel o er

    &ic& t&e transaction is conducted= 'ut ne ert&eless pro ides useful conte tualinfor"ation for one or "ore of t&e parties to t&e transaction

    < !ollateral infor"ation can include "any t&ings ranging fro" tone of oice in a telep&one call to t&e 'usiness cards and letter&eads and apparent aut&ority of t&e

    person it& &o" you are dealing.

    $age 5

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    13/130

    < #lobali*ation Flo b ali4ation( o r t&e "ini"i4ation of geograp&ical factors in " a ing pay"ents( &as 'e e n an o' ious aspect of t&e ne p a y"ents sy s te"s. Its a f fect isupon areas su c & as si4e o f t&e pay m ents "ar et p lace( unce r tainty as t o legal

    juri s diction in t &e e ent o f disputes( location a n d a aila'ili t y of transac t ion trails(a n d t&e a'ili t y of a pay m ent sc&e" e to rapidly a dapt to re g ulatory regi"es i"pos e d'y one country 'y "o ing to anot&er.

    < +ew business models Ge 'u s iness "odels are 'eing de eloped to e ploit t&e ne

    pay" e nt tec&nol o gies( in parti c ular to add r ess or ta e ad antage o f t&edisin t er"ediatio n of custo" e rs fro" tra d itional pay m ent pro ide r s suc& as ' a n s. Int&is conte t( di s inter"ediation is &er e t&e tec&n o logy ena'le s a t&ird pa r ty tointer e n e 'et een t&e custo"er and t&e 'an ing sys t e"( e@ecti v ely transfe r ringt&e custo"er9s tr u sted relation s &ip it& t&e 'an to t&e ne party.

    Elements of an e-Bus i ness solut i on,&e ision of e Business is t&at enter p rises ill &a e access t o 'road ra n ge of trad i ng

    partner s to interac t and coll a 'orate it h and not o nly 'uy an d sell "ore e@iciently. Also(it is e pected t&a t eBusiness ill contr i 'ute to ards t&e agi l ity of 'usin e ss organi4a t ionsand i t & t&at to reac&ing &ig&e r le els of c usto"i4ation. In t&is ay( an org a ni4ation ca n"a i"i4e s upply c&ain e ciency( im pro e custo"er ser ic e and increase profit "argins.Henc e ( t&e nee d to "a e m ission critic a l processes8

    Inventor(, )ccounting, anufacturing and Customer upport" ,&ese( "u s t 'e a'le t ointeract i t & eac& ot& e r 'y 'ec o "ing e' e na'led. ,&is is ac&ie e d 'y ER$( ! ! and ot& e r syste"s b y "a ing u s e of distri' u ted applica t ions t&a t e tract dat a and launc h 'usiness

    processes acr o ss "any or a ll of t&e a' o e process e s.

    ,&e ey ele"en t s of an eB u siness solut i on are8

    1. !usto" e r Resource "anage"e n t+!R - 2. Enterpr i seresource planning +E! $-?. Supply C&ain

    ana g e"ent +S! - . no le d ge anage"ent

    5. e ar e ts

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    14/130

    $ a ge #

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    15/130

    Customer relationship management C! /!R syste"s are :frontoffice; syste"s &ic& &elp t&e enterprise deal directly it& itscusto"ers. !R +de*nition- is t&e process of creating relations&ips it& custo"ers t&roug&relia'le ser ice auto"ated processes( personal infor"ation gat&ering( processing and selfser ice t&roug& t&e enterprise in order to create alue for custo"ers.

    ,&ere are ? categories of user applications under !R s8

    < Customerfacing

    applications" Applications&ic& ena'le

    custo"ers to order products and ser ices

    < alesforce facingapplications"

    Applications t&at auto"ate so"e of t&e sales

    and salesforce "anage"ent functions( and support dispatc& and logistic functions.

    < anagementfacingapplications"

    Applications &ic& gat&er data fro" pre ious appsand pro ide"anage"ent reports and

    co"pute Return onrelations&ips+RoR- as per co"pany9s 'usiness "odel

    Enterprise !esource Planning E!P/ER$s are often called :'ac o ce; syste"s. ER$ syste"s are "anage"ent infor"ationsyste"s t&at integrate and auto"ate "any of t&e 'usiness practices associated it&operations or production aspects of a co"pany. ER$ soft are can aid in control of "any 'usiness acti ities suc& as sales( deli ery( production( 'illing( production( in entory(s&ipping( in oicing and accounting.

    A typical ER$ syste" is designed aroundt&ese pri"ary 'usiness procedures8

    irtuali*ation,&ere are t&ree popular approac&es to ser er irtuali4ation8

    1. ,&e irtual "ac&ine "odel(2. ,&e para irtual "ac&ine "odel(?. 6irtuali4ation at t&e operating syste" +7S- layer.

    Virtual Machines6irtual "ac&ines are 'ased on t&e &ost/guest paradig". Eac& guest runs on a irtual

    i"itation of t&e &ard are layer. ,&is approac&allo s t&e guest operating syste" to run it&out "odi*cations. It also allo st&e ad"inistrator to create guests t&at use di@erent operating syste"s. ,&e guest &as no

    no ledge of t&e &ostKs operating syste" 'ecause it is not a are t&at itKs not running on real

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    86/130

    &ard are.

    It does( &o e er( re uire real co"puting resources fro" t&e &ost so it uses a &yper isor tocoordinate instructions to t&e !$L. ,&e &yper isor is called a irtual "ac&ine "onitor +6 -. It alidates all t&e guestissued !$L instructions and "anages any e ecuted codet&at re uires addition pri ileges. 6 are and icrosoft 6irtual Ser er 'ot& use t&e irtual"ac&ine "odel.

    $age

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    87/130

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    88/130

    &ig& le els of irtual "ac&ine perfor"ance.

    ,&is tec&ni ue does( &o e er( &a e t&e ad antage t&at no c&anges are necessary to eit&er &ost or guest operating syste"s and no special !$L &ard are irtuali4ation support is re uired.

    $age 50

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    89/130

    Shared $ernel Virtuali!ationS&ared ernel irtuali4ation +also no n as syste" le el or operating syste" irtuali4ation-ta es ad antage of t&e arc&itectural design of Dinu and LGIP 'ased operating syste"s.In order to understand &o s&ared ernel

    irtuali4ation or s it &elps to *rst understand t&e t o"ain co"ponents of Dinu or LGIP

    operating syste"s. At t&e core of t&eoperating syste" is t&e ernel. ,&e ernel(in si"ple ter"s( &andles all t&einteractions 'et een t&e operatingsyste" and t&e p&ysical &ard are. ,&esecond ey co"ponent is t&e root filesyste" &ic& contains all t&e li'raries(*les and utilities necessary for t&eoperating syste" to function. Lnder s&ared ernel irtuali4ation t&e irtual guest

    syste"s eac& &a e t&eir o n root *lesyste" 'ut s&are t&e ernel of t&e &ost operating syste". ,&is structure isillustrated in t&e follo ing arc&itecturaldiagra".

    ,&is type of irtuali4ation is "ade possi'le 'y t&e a'ility of t&e ernel to dyna"ically c&ange t&e current root file syste" +a concept no n as c&root- to a di@erent root *lesyste" it&out &a ing to re'oot t&e entire syste". Essentially( s&ared ernel irtuali4ation isan e tension of t&is capa'ility.

    $er&aps t&e 'iggest single dra 'ac of t&is for" of irtuali4ation is t&e fact t&at t&e guest operating syste"s "ust 'e co"pati'le it& t&e ersion of t&e ernel &ic& is 'eing s&ared. It is not( for e a"ple( possi'le to run icrosoft %indo s as a guest on a Dinu syste" using t&es&ared ernel approac&. Gor is it possi'le for a Dinu guest syste" designed for t&e 2.#

    ersion of t&e ernel to s&are a 2. ersion ernel.

    $ernel Level Virtuali!ationLnder ernel le el irtuali4ation t&e &ost operating syste" runs on a specially "odified ernel &ic& contains e tensionsdesigned to "anage and control "ultiple

    irtual "ac&ines eac& containing a guest operating syste". Lnli e s&ared ernel

    irtuali4ation eac& guest runs its o nernel( alt&oug& si"ilar restrictions apply

    in t&at t&e guest operating syste"s "ust &a e 'een co"piled for t&e sa"e &ard areas t&e ernel in &ic& t&ey are running.E a"ples of ernel le el irtuali4ationtec&nologies include Lser ode Dinu +L D- and ernel'ased 6irtual ac&ine+ 6 -. ,&e follo ing diagra" pro ides ano er ie of t&e ernel le el irtuali4ationarc&itecture8

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    90/130

    $age 51

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    91/130

    Cloud Comput i ng!lo u d co"putin g is Internet 'ased co" p uting( &e r e'y s&ared r esources( s o ft are( an d infor"atio n are pro id e d to co m puters and o t&er de ic e s on de"a n d( li e t&e e lectricity grid.!loud co m puting is a paradig" s h ift follo in g t&e s&if t fro" "ain f ra"e to cli e ntJser er int&e early 1 0s. etail s are a'stra c ted fro" t h e users( &o no longer &a e nee d for e pertise in( or co n trol o er( t&e tec&nolog ( infrastructure )in t&e c loud) t&at s u pports t&e m .

    !lo u d co"puti n g descri'e s a ne su p ple"ent( c o nsu"ption , and deli e ry "odel f o r I, ser ic e s 'ased o n t&e Internet( and it t ypically in o l es o ert h eInternet p ro ision of dyna"ically scala'le and often irtuali4ed reso u rces. It is a 'yproduct and conse uence of t h e easeofaccess to re m ote co"p u ting sites p r o ided 'y t&e Internet. ,&is fre uently ta e s t&e for" of e''ase d tools or applications t h at users ca n access an d use t&roug h a

    e' 'ro ser as if it ere a progra" installed locally o n t&eir o n co"puter.

    GIS 6 pro ides a so"e &at "ore o'jec t i e and speci*c definiti o n &ere. ,& e ter" )cloud) isused a s a "etap& o r for t&e Internet( 'a s ed on t&e cloud dra in g used in t&e past torepresent t&e t e lep&one n e t or and la ter to depi c t t&e Internet in co" p uter net or diagra"s as an a'straction of t&e underlying infrastructur e it represe n ts.

    ,yp i cal cloud co"puting pr o iders deli e r co""on b usiness ap p lications online t&at ar eaccessed fr o " anot&er %e' ser v ice or soft w are li e a : e' 'ro ser , &ile t&e s oft are an d data are s t ored on ser v ers. A ey e le"ent of c loud co m puting is cu s to"i4ation and t&ecre a tion of a us e rde*ned e 5 perience.

    o s t cloud co"puting in f rastructure sconsist o f ser v ices deli e r ed t&roug hco""on centers an d 'uil t on ser ers.

    !lo u ds often a p pear as single points o f access fo r con s u"ersK co m puting nee d s.!o""er c ial o@ering s are generally e 5 pected to "eet uali t y of ser ic e+Xo - re uire m ents of cu sto"ers( andtypicall ( incl u de ser ice le elagree"ents +SDAs -.,&e "ajo r clo u dser ice p r o iders include Salesforce(

    A"a4o n and Foogle. So "e of t&e larger I, * r "s t&at ar e acti v ely in ol e d incloud c o "puting a r e icrosoft( He w lett

    $ac ar d and IB .

    !lo u d co"putin g deri es c&a racteristics fro"( 'ut s h ould not 'e confused it&8

    1 . Autono m ic co"puting T )co" p uter syste"s capa'le of self"anag e "ent)2 . !lientJser er "od e l J clientJser er co m puting re f ers 'roadl ( to any d istri'uted

    application t&at distinguis&es 'et e e n ser ice p r o iders +se r ers- and s e r ice re ue s ters +clients-.

    ? . Frid co m puting T a for" of d istri'uted co"puting a n d parallel c o"puting( w &ere'y a Ks uper and irtual co"put e rK is co"p o sed of a cl u ster of net w or ed( loosely coupledco"puters acting in co n cert to perfor" ery lar g e tas s)

    . ainfra"e co"put e r T po e r ful co"put e rs used " a inly 'y lar g e organi4at i ons for critical applica t ions( typicall ( 'ul data p rocessing s u c& as censu s ( industry andconsu" e r statistics( e nterprise r e source planning( and *nancial transa c tion proces s ing.

    5 . Ltility co "puting T t&e )pac a g ing of co" p uting resources( suc& a s co"putati o n and

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    92/130

    storage( as a "e t ered ser ice s i"ilar to a traditional p u 'lic utility( s uc& as elec t ricity)=# . $eerto peer J a di s tri'uted ar c &itecture i t&out t&e n e ed for cen t ral coordin a tion(

    it& participants b eing at t&e s a"e ti"e b ot& supplie r s and cons u "ers of resources+in contrast to t&e traditional clientJs e r er "odel-

    CharacteristicsIn g e neral( clou d co"putin g custo"ers do not o n t&e p&ysical infrastruct u re( instead

    a oiding ca p ital e penditure 'y renting usag e fro" a t&irdparty pr o ider. ,&ey consu"eresources as a ser ice an d pay only f o r resources t&at t&e ( use. an ( cloudco m putingoff e rings e"pl o y t&e utili t y co"puti n g "odel( w &ic& is analogous to &o tra d itionalutilit ( ser ices +su c& as electricity- are co n su"ed( &e reas ot&er s 'ill on a su b scription ' a sis.

    S&a r ing )peris& a 'le and in t angi'le) c o "puting p o er a"ong "ultiple t e nants cani "pro e uti l i4ation rates( as ser v ers are not unnecessari l y left idle + w &ic& can reduce costss igni*cantly w &ile increa s ing t&e speed of applic a tion de e lop"ent-.

    A si d ee@ect of t &is approa c & is t&at o e rall co"put e r usage ris e s dra"atically( as custo m ersdo not &a e to eng i neer for p ea load lim its. In addi t ion( )increa s ed &ig&sp e ed 'and i d t&)"a es it possi'le to recei e t&e sa"e. ,&e c loud is ' e co"ing inc r easingly as s ociated it hs"all and "ediu" en t erprises +S Es- as in " a ny cases t&ey cannot ju stify or a f ford t&elar g e capital e 5 penditure of traditional I,.

    S Es also typic a lly &a e les s e isting inf r astructure( less 'ureau c racy( "ore f le i'ility( a n ds"aller ca pital 'udge t s for pur c &asing in&o use tec&no l ogy. Si"ilarly( S Es in e "erging" a r ets are ty p ically un'urdened 'y e s ta'lis&ed le gacy infr a structures( t&us reduci n g t&eco"p l e ity of deploying cloud solutions.

    )rchitecture

    !lou

    d

    arc&itect u

    re(

    t&e

    syst e"s

    arc&it

    ecture

    of

    t&

    e soft are

    s

    ( ste"s

    in ol ed

    in

    t&ed eli ery of c loud co"p u ting( typi c ally in ol e s "ultiple cloud co"ponents co m "unicatin g

    it& eac h ot&er o e r applicati o n progra""ing interfaces( usually e' ser i ces.

    ,&is rese"'les t&e Lni p h ilosop&y of &a ing "ul t iple progra"s eac&d o ing one t&ing ell and or ingto g et&er o er u ni ersal int e rfaces.!o m ple ity is controlled and t&er e sulting syste"s are m ore"ana g ea'le t&a n t&eir "o n olit&iccou n terparts.

    ,&e t o "ost significant c o "ponentso f cloud co m puting arc& i tecture are

    no n as t h e front end and t&e 'a c end. ,&e f ront end is t &e part see n 'y t&e client( i.e. t&e c o"puter u s er.,&is incl u des t&e client9s net or 1 +or co"pu t er- and t&e application s usedto ac c ess t&e clo u d ia a use r

    interface s u c& as a e b 'ro ser.,&e 'ac end of t&e cloud co"puting a rc&itecture is t&e Ucloud9 itself( co m prising arious co"pu t ers( ser er s and dat a storage de v ices.

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    93/130

    0e( Features< )gilit( i"pro es i t& usersK a b ility to rap i dly and in e pensi ely r epro ision

    tec&nologi c al infrastru c ture resourc e s.< Cost is c lai"ed to 'e greatly re d uced and c apital e pe n diture is co n erted to

    o perational e penditure. ,&is ostensi b ly lo ers ' a rriers to en t ry( as infra s tructure is typically pro v ided 'y a t h irdparty a n d does not need to 'e p u rc&ased for oneti"e o r infre uent intensi e co "puting ta s s. $ricing o n a utility co "puting ' a sis is *ne

    gra i ned it& usage'ased options and fe er I, s ill s are re uir e d for i"ple m entation +in &ouse-.

    < evice and location independence ena'le users to access syste"s using a e''ro ser regardless of t&eir location or &at de ice t&ey are using +e.g.( $!( "o'ile-. Asinfrastructure is offsite +typically pro ided 'y a t&irdparty- and accessed ia t&eInternet( users can connect fro" any &ere.

    < ultitenanc( ena'les s&aring of resources and costs across a large pool of users t&us allo ing for8

    o !entrali4ation of infrastructure in locations it& lo er costs +suc& as real estate( electricity( etc.- o $ea load capacity increases +users need not engineer for &ig&est possi'le loadle els-o Ltili4ation and e ciency i"pro e"ents for syste"s t&at are often only 10J20b utili4ed.

    < !eliabilit( is i"pro ed if "ultiple redundant sites are used( &ic& "a es elldesigned cloud co"puting suita'le for 'usiness continuity and disaster reco ery.\? ] Gonet&eless( "any "ajor cloud co"puting ser ices &a e su@ered outages( andI, and 'usiness "anagers can at ti"es do little &en t&ey are affected.

    < calabilit( ia dyna"ic +)onde"and)- pro isioning of resources on a *negrained(

    selfser ice 'asis near realti"e( it&out users &a ing to engineer for pea loads.$erfor"ance is "onitored( and consistent and loosely coupled arc&itectures areconstructed using e' ser ices as t&e syste" interface. 7ne of t&e "ost i"portant ne "et&ods for o erco"ing perfor"ance 'ottlenec s for a large class of applications is data parallel progra""ing on a distri'uted data grid.

    < ecurit( could i"pro e due to centrali4ation of data( increased securityfocusedresources( etc.( 'ut concerns can persist a'out loss of control o er certain sensiti edata( and t&e lac of security for stored ernels. Security is often as good as or 'etter t&an under traditional syste"s( in part 'ecause pro iders are a'le to de ote resourcesto sol ing security issues t&at "any custo"ers cannot afford. $ro iders typically log

    accesses( 'ut accessing t&e audit logs t&e"sel es can 'e difficult or i"possi'le.Curt&er"ore( t&e co"ple ity of security is greatly increased &en data is distri'utedo er a ider area and / or nu"'er of de ices.

    < aintenance of cloud co"puting applications is easier( since t&ey donKt &a e to 'e installed on eac& userKs co"puter. ,&ey are easier to support and to i"pro e since t&e c&anges reac& t&e clients instantly.

    < etering "eans t&at cloud co"puting resources usage s&ould 'e "easura'le and s&ould 'e "etered per client and application on a daily( ee ly( "ont&ly( and yearly 'asis.

    Cloud

    computing pro ides t&e "eans t&roug& &ic& e eryt&ing T fro" co"puting po er to co"puting infrastructure( applications( 'usiness processes to personal

    colla'oration T can 'e deli ered to you as a ser ice &ere er and &ene er you need.

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    94/130

    eplo(ment odels!loud co"puting is o@ered in

    di@erent for"s8 < Public clouds

    o $u'lic cloud or e ternal cloud descri'es cloud co"puting in t&e traditional"ainstrea" sense( &ere'y resources are dyna"ically pro isioned on a*negrained( selfser ice 'asis o er t&e Internet( ia e' applications/ e'ser ices( fro" an o@site t&irdparty pro ider &o 'ills on a *ne grained utility

    co"puting 'asis.< Communit( Cloudso A co""unity cloud "ay 'e esta'lis&ed &ere se eral organi4ations &a e si"ilar

    re uire"ents and see to s&are infrastructure so as to reali4e so"e of t&e'ene*ts of cloud co"puting. %it& t&e costs spread o er fe er users t&an a

    pu'lic cloud +'ut "ore t&an a single tenant- t&is option is "ore e pensi e 'ut "ay offer a &ig&er le el of pri acy( security and/or policy co"pliance.

    < $ri ateclouds olunteer Computing%&en people *rst &ear a'out Hadoop and apReduce( t&ey often as ( )Ho is it di@erent fro" SE,I &o"e ) SE,I( t&e Searc& for E tra,errestrial Intelligence( runs a project calledSE,I & o "e in &ic& olunteers donate !$L ti"e fro" t&eir ot&er ise idle co"puters toanaly4e radio telescope data for signs of intelligent life outside eart&. SE,I &o"e is t&e"ost ell no n of "any volunteer computing projects= ot&ers include t&e Freat Internet

    ersenne $ri"e Searc& +to searc& for large pri"e nu"'ers- and Colding &o"e +tounderstand protein folding( and &o it relates to disease-.

    6olunteer co"puting projects or 'y 'rea ing t&e pro'le" t&ey are trying to sol e into

    c&un s called wor units, &ic& are sent to co"puters around t&e orld to 'e analy4ed. Cor e a"ple( a SE,I &o"e or unit is a'out 0.?5 B of radio telescope data( and ta es &oursor days to analy4e on a typical &o"e co"puter. %&en t&e analysis is co"pleted( t&e resultsare sent 'ac to t&e ser er( and t&e client gets anot&er or unit. As a precaution to co"'at c&eating( eac& or unit is sent to t&ree di@erent "ac&ines( and needs at least t o results toagree to 'e accepted.

    Alt&oug& SE,I &o"e "ay 'e super*cially si"ilar to apReduce +'rea ing a pro'le" intoindependent pieces to 'e or ed on in parallel-( t&ere are so"e signi*cant differences. ,&eSE,I &o"e pro'le" is ery !$Lintensi e( &ic& "a es it suita'le for running on &undreds

    of t&ousands of co"puters across t&e orld(1

    ) since t&e ti"e to transfer t&e or unit isd arfed 'y t&e ti"e to run t&e co"putation on it. 6olunteers are donating !$L cycles(not 'and idt&.

    apReduce is designed to run jo's t&at last "inutes or &ours on trusted( dedicated

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    113/130

    &ard are running in a single data center it& ery &ig& aggregate 'and idt&interconnects. By contrast( SE,I &o"e runs a perpetual co"putation on untrusted"ac&ines on t&e Internet it& &ig&ly aria'le connection speeds and no data locality.

    ) Brief

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    114/130

    t&erefore "ore "undane- na"es. ,&is is a good principle( as it "eans you can generally or out &at so"et&ing does fro" its na"e. Cor e a"ple( t&e jo'trac er5 eeps trac of apReduce jo's.

    Building a e' searc& engine fro" scratc& as an a"'itious goal( for not only is t&e soft arere uired to cra l and inde e'sites co"ple to rite( 'ut it is also a c&allenge to run

    it&out a dedicated operations tea"( since t&ere are so "any "o ing parts. ItKs e pensi etoo8 i e !afarella and oug !utting esti"ated a syste" supporting a 1 'illionpage inde

    ould cost around &alf a "illion dollars in &ard are( it& a "ont&ly running cost of ?0(000.) Ge ert&eless( t&ey 'elie ed it as a ort&y goal( as it ould open up and

    ulti"ately de"ocrati4e searc& engine algorit&"s.

    Gutc& as started in 2002( and a or ing cra ler and searc& syste" uic ly e"erged.Ho e er( t&ey reali4ed t&at t&eir arc&itecture ouldnKt scale to t&e 'illions of pages on t&e%e'. Help as at &and it& t&e pu'lication of a paper in 200? t&at descri'ed t&earc&itecture of FoogleKs distri'uted *lesyste"( called FCS( &ic& as 'eing used in productionat Foogle. FCS( or so"et&ing li e it( ould sol e t&eir storage needs for t&e ery large *lesgenerated as a part of t&e e' cra l and inde ing process. In particular( FCS ould free up

    ti"e 'eing spent on ad"inistrati e tas s suc& as "anaging storage nodes. In 200 ( t&ey set a'out riting an open source i"ple"entation( t&e Gutc& istri'uted Cilesyste" +G CS-.

    In 200 ( Foogle pu'lis&ed t&e paper t&at introduced apReduce to t&e orld._ Early in 2005(t&e Gutc& de elopers &ad a or ing apReduce i"ple"entation in Gutc&( and 'y t&e "iddleof t&at year all t&e "ajor Gutc& algorit&"s &ad 'een ported to run using apReduce andG CS.

    G CS and t&e apReduce i"ple"entation in Gutc& ere applica'le 'eyond t&e real" of searc&( and in Ce'ruary 200# t&ey "o ed out of Gutc& to for" an independent su'project of Ducene called Hadoop. At around t&e sa"e ti"e( oug !utting joined Ma&ooN( &ic&

    pro ided a dedicated tea" and t&e resources to turn Hadoop into a syste" t&at ran at e'scale +see side'ar-. ,&is as de"onstrated in Ce'ruary 200 &en Ma&ooN announced

    t&at its production searc& inde as 'eing generated 'y a 10(000core Hadoop cluster. f

    In Yanuary 200 ( Hadoop as "ade its o n tople el project at Apac&e( confir"ing its successand its di erse( acti e co""unity. By t&is ti"e Hadoop as 'eing used 'y "any ot&er co"panies 'esides Ma&ooN( suc& as Dast.f"( Cace'oo .

    In one ellpu'lici4ed feat( t&e 1ew 2or Times used A"a4onKs E!2 co"pute cloud to crunc&t&roug& four tera'ytes of scanned arc&i es fro" t&e paper con erting t&e" to $ Cs for t&e

    %e'.t ,&e processing too less t&an 2 &ours to run using 100 "ac&ines( and t&e project pro'a'ly ouldnKt &a e 'een e"'ar ed on it&out t&e co"'ination of A"a4onKs pay'yt&e&our "odel + &ic& allo ed t&e GM, to access a large nu"'er of "ac&ines for a s&ort period-(and HadoopKs easytouse parallel progra""ing "odel.

    In April 200 ( Hadoop 'ro e a orld record to 'eco"e t&e fastest syste" to sort a tera'yteof data. Running on a 10node cluster( Hadoop sorted one tera'yte in 20 seconds +just under ?b "inutes-( 'eating t&e pre ious yearKs inner of 2 Q seconds +descri'ed in detail in),eraByte Sort on Apac&e Hadoop) on page #1-. In Go e"'er of t&e sa"e year( Fooglereported t&at its apReduce i"ple"entation sorted one tera'yte in # seconds.

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    115/130

    t&e no n %e'= t&e Indexer, &ic& 'uilds a re erse inde to t&e 'est pages= and t&eRuntime, &ic& ans ers usersK ueries. ,&e %e' ap is a grap& t&at consists of roug&ly 1

    trillion +10 12 - edges eac& representing a e' lin and 100 'illion +10 11 - nodes eac&representing distinct LRDs. !reating and analy4ing suc& a large grap& re uires a large nu"'er of co"puters running for "any days.

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    116/130

    In early 2005( t&e infrastructure for t&e %e' ap( na"ed )readnaught, needed to 'eredesigned to scale up to "ore nodes. readnaug&t &ad successfully scaled fro" 20 to #00nodes( 'ut re uired a co"plete redesign to scale up furt&er. readnaug&t is si"ilar to

    apReduce in "any ays( 'ut pro ides "ore 3e i'ility and less structure. In particular(eac& frag"ent in a readnaug&t jo' can send output to eac& of t&e frag"ents in t&e ne t stage of t&e jo'( 'ut t&e sort as all done in li'rary code. In practice( "ost of t&e %e' ap

    p&ases ere pairs t&at corresponded to apReduce. ,&erefore( t&e %e' ap applications

    ould not re uire e tensi e refactoring to *t into apReduce.

    Eric Baldesc& ieler +Eric1 - created a s"all tea" and e starting designing and prototyping a ne fra"e or ritten in !^^ "odeled after FCS and apReduce to replace

    readnaug&t. Alt&oug& t&e i""ediate need as for a ne fra"e or for %e' ap( it asclear t&at standardi4ation of t&e 'atc& platfor" across Ma&ooN Searc& as critical and 'y "a ing t&e fra"e or general enoug& to support ot&er users( e could 'etter le eragein est"ent in t&e ne platfor".

    At t&e sa"e ti"e( e ere atc&ing Hadoop( &ic& as part of Gutc&( and its progress. In Yanuary 200#( Ma&ooN &ired oug !utting( and a "ont& later e decided to a'andon our

    prototype and adopt Hadoop. ,&e ad antage of Hadoop o er our prototype and design ast&at it as already or ing it& a real application +Gutc&- on 20 nodes. ,&at allo ed us to'ring up a researc& cluster t o "ont&s later and start &elping real custo"ers use t&ene fra"e or "uc& sooner t&an e could &a e ot&er ise. Anot&er ad antage( of course(

    as t&at since Hadoop as already open source( it as easier +alt&oug& far fro" easyN- toget per"ission fro" Ma&ooNKs legal depart"ent to or in open source. So e set up a200node cluster for t&e researc&ers in early 200# and put t&e %e' ap con ersion planson &old &ile e supported and i"pro ed Hadoop for t&e researc& users.

    6he )pache

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    117/130

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    118/130

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    119/130

    < F%&en a dataset outgro s t&e storage capacity of a single p&ysical "ac&ine( it 'eco"esnecessary to partition it across a nu"'er of separate "ac&ines. Cilesyste"s t&at "anaget&e storage across a net or of "ac&ines are called distri"uted ?lesystems . Since t&ey arenet or 'ased( all t&e co"plications of net or progra""ing ic in( t&us "a ingdistri'uted *lesyste"s "ore co"ple t&an regular dis *lesyste"s. Cor e a"ple( one of t&e 'iggest c&allenges is "a ing t&e *lesyste" tolerate node failure it&out su@ering dataloss.

    Hadoop co"es it& a distri'uted filesyste" called H CS( &ic& stands for 9adoop )istri"uted:ilesystem. +Mou "ay so"eti"es see references to ) CS)Tinfor"ally or in older docu"entation or configurationT &ic& is t&e sa"e t&ing.- H CS is HadoopKs flags&ipfilesyste" and is t&e focus of t&is c&apter( 'ut Hadoop actually &as a general purpose*lesyste" a'straction( so eKll see along t&e ay &o Hadoop integrates it& ot&er storagesyste"s +suc& as t&e local *lesyste" and A"a4on S?-.

    6he esign of < FH CS is a *lesyste" designed for storing ery large *les it& strea"ing data access patterns(running on clusters on co""odity &ard are. DetKs e a"ine t&is state"ent in "ore detail8

    @ery large files

    )6ery large) in t&is conte t "eans *les t&at are &undreds of "ega'ytes( giga'ytes( or tera'ytes in si4e. ,&ere are Hadoop clusters running today t&at store peta 'ytes of data._

    treaming data access

    H CS is 'uilt around t&e idea t&at t&e "ost e cient data processing pattern is a riteonce( read"anyti"es pattern. A dataset is typically generated or copied fro" source( t&en

    arious analyses are perfor"ed on t&at dataset o er ti"e. Eac& analysis ill in ol e a large proportion( if not all( of t&e dataset( so t&e ti"e to read t&e &ole dataset is "orei"portant t&an t&e latency in reading t&e *rst record.

    Commodity hardware

    Hadoop doesnKt re uire e pensi e( &ig&ly relia'le &ard are to run on. ItKs designed to run onclusters of co""odity &ard are +co""only a aila'le &ard are a aila'le fro" "ultiple

    endorsk- for &ic& t&e c&ance of node failure across t&e cluster is &ig&( at least for largeclusters. H CS is designed to carry on or ing it&out a noticea'le interruption to t&e user in t&e face of suc& failure. It is also ort& e a"ining t&e applications for &ic& using H CSdoes not or so ell. %&ile t&is "ay c&ange in t&e future( t&ese are areas &ere H CS is not a good fit today8

    >ow(latency data access

    Applications t&at re uire lo latency access to data( in t&e tens of "illiseconds range( ill not or ell it& H CS. Re"e"'er H CS is opti"i4ed for deli ering a &ig& t&roug&put of data(

    and t&is "ay 'e at t&e e pense of latency. HBase is currently a 'etter c&oice for lo latency

    access.

    >ots of small ?les

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    120/130

    Since t&e na"enode &olds filesyste" "etadata in "e"ory( t&e li"it to t&e nu"'er of *les in a *lesyste" is go erned 'y t&e a"ount of "e"ory on t&e na"enode. As a rule of t&u"'( eac& *le( directory( and 'loc ta es

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    121/130

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    122/130

    Second( "a ing t&e unit of a'straction a 'loc rat&er t&an a *le si"pli*es t&e storagesu'syste". Si"plicity is so"et&ing to stri e for all in all syste"s( 'ut is i"portant for a distri'uted syste" in &ic& t&e failure "odes are so aried. ,&e storagesu'syste" deals it& 'loc s( si"plifying storage "anage"ent +since 'loc s are a* ed si4e( it is easy to calculate &o "any can 'e stored on a gi en dis -( andeli"inating

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    123/130

    "etadata concerns +'loc s are just a c&un of data to 'e storedTfile "etadatasuc& as per"issions infor"ation does not need to 'e stored it& t&e 'loc s( soanot&er syste" can &andle "etadata ort&ogonally-.

    Curt&er"ore( 'loc s *t ell it& replication for pro iding fault tolerance and

    a aila'ility. ,o insure against corrupted 'loc s and dis and "ac&ine failure( eac&'loc is replicated to a s"all nu"'er of p&ysically separate "ac&ines +typically t&ree-. If a 'loc 'eco"es una aila'le( a copy can 'e read fro" anot&er locationin a ay t&at is transparent to t&e client. A 'loc t&at is no longer a aila'le due tocorruption or "ac&ine failure can 'e replicated fro" t&eir alternati e locations toot&er li e "ac&ines to 'ring t&e replication factor 'ac to t&e nor"al le el. +See) ata Integrity) on page Q5 for "ore on guarding against corrupt data.- Si"ilarly(so"e applications "ay c&oose to set a &ig& replication factor for t&e 'loc s in a

    popular *le to spread t&e read load on t&e cluster.

    Di e its dis filesyste" cousin( H CSKs fsc co""and understands 'loc s. Cor e a"ple( running8

    N hadoop fsc1 files bloc1s

    ill list t&e 'loc s t&at "a e up eac& *le in t&e *lesyste".

    +amenodes and atanodes A H CS cluster &as t o types of node operating in a "aster or er pattern8 a name(node +t&e "aster- and a nu"'er of datanodes + or ers-. ,&e na"enode "anagest&e *lesyste" na"espace. It "aintains t&e *lesyste" tree and t&e "etadata for allt&e *les and directories in t&e tree. ,&is infor"ation is stored persistently on t&elocal dis in t&e for" of t o *les8 t&e na"espace i"age and t&e edit log. ,&ena"enode also no s t&e datanodes on &ic& all t&e 'loc s for a gi en *le arelocated( &o e er( it does not store 'loc locations persistently( since t&is infor"ation isreconstructed fro" datanodes &en t&e syste" starts.

    A client accesses t&e filesyste" on 'e&alf of t&e user 'y co""unicating it& t&e na"enode and datanodes. ,&e client presents a $7SIPli e *lesyste" interface( so t&euser code does not need to no a'out t&e na"enode and datanode to function.

    atanodes are t&e or &orses of t&e *lesyste". ,&ey store and retrie e 'loc s &ent&ey are told to +'y clients or t&e na"enode-( and t&ey report 'ac to t&e na"enode

    periodically it& lists of 'loc s t&at t&ey are storing.

    %it&out t&e na"enode( t&e *lesyste" cannot 'e used. In fact( if t&e "ac&ine runningt&e na"enode ere o'literated( all t&e *les on t&e *lesyste" ould 'e lost sincet&ere ould 'e no ay of no ing &o to reconstruct t&e *les fro" t&e 'loc s ont&e datanodes. Cor t&is reason( it is i"portant to "a e t&e na"enode resilient tofailure( and Hadoop pro ides t o "ec&anis"s for t&is.

    ,&e *rst ay is to 'ac up t&e *les t&at "a e up t&e persistent state of t&e*lesyste" "etadata. Hadoop can 'e con*gured so t&at t&e na"enode rites its

    persistent state to "ultiple *lesyste"s. ,&ese rites are sync&ronous and ato"ic. ,&eusual con*guration c&oice is to rite to local dis as ell as a re"ote GCS "ount.

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    124/130

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    125/130

    It is also possi'le to run a secondary namenode, &ic& despite its na"e does not act as ana"enode. Its "ain role is to periodically "erge t&e na"espace i"age it& t&e edit log to

    pre ent t&e edit log fro" 'eco"ing too large. ,&e secondary na"enode usually runs on aseparate p&ysical "ac&ine( since it re uires plenty of !$L and as "uc& "e"ory as t&ena"enode to perfor" t&e "erge. It eeps a copy of t&e "erged na"espace i"age( &ic&can 'e used in t&e e ent of t&e na"enode failing. Ho e er( t&e state of t&e secondary na"enode lags t&at of t&e pri"ary( so in t&e e ent of total failure of t&e pri"ary data( loss isal"ost guaranteed. ,&e usual course of action in t&is case is to copy t&e na"enodeKs"etadata *les t&at are on GCS to t&e secondary and run it as t&e ne pri"ary.

    6he Command-'ine Interface%eKre going to &a e a loo at H CS 'y interacting it& it fro" t&e co""and line. ,&ere are "any ot&er interfaces to H CS( 'ut t&e co""and line is one of t&e si"plest( and to "any de elopers t&e "ost fa"iliar.

    %e are going to run H CS on one "ac&ine( so *rst follo t&e instructions for setting up

    Hadoop in pseudo distri'uted "ode in Appendi A. Dater youKll see &o to run on a cluster of "ac&ines to gi e us scala'ility and fault tolerance.

    ,&ere are t o properties t&at e set in t&e pseudodistri'uted con*guration t&at deser efurt&er e planation. ,&e *rst is fs.default.na"e( set to hdfs&//localhost/, &ic& is used to set adefault *lesyste" for Hadoop. Cilesyste"s are speci*ed 'y a LRI( and &ere e &a e used a&dfs LRI to configure Hadoop to use H CS 'y default. ,&e H CS dae"ons ill use t&is

    property to deter"ine t&e &ost and port for t&e H CS na"enode. %eKll 'e running it onlocal&ost( on t&e default H CS port( 020. And H CS clients ill use t&is property to or out

    &ere t&e na"enode is running so t&ey can connect to it.

    %e set t&e second property( dfs.replication( to one so t&at H CS doesnKt replicate*lesyste" 'loc s 'y t&e usual default of t&ree. %&en running it& a single datanode( H CScanKt replicate 'loc s to t&ree datanodes( so it ould perpetually arn a'out 'loc s 'eingunderreplicated. ,&is setting sol es t&at pro'le".

    Basic Files(stem $perations,&e filesyste" is ready to 'e used( and e can do all of t&e usual *lesyste" operations suc&as reading *les( creating directories( "o ing *les( deleting data( and listing directories. Moucan type &adoop fs &elp to get detailed &elp on e ery co""and. Start 'y copying a *le fro"t&e local filesyste" to H CS8

    N hadoop fs cop(From'ocal input3docs32uangle.t5t hdfs"33localhost3user3tom32uangle.t5t

    ,&is co""and in o es HadoopKs *lesyste" s&ell co""and fs( &ic& supports a nu"'er of su'co""andsTin t&is case( e are running copyCro"Docal. ,&e local file ;uangle.txt iscopied to t&e *le /user/tom/;uangle.txt on t&e H CS instance running on local&ost. In fact(

    e could &a e o"itted t&e sc&e"e and &ost of t&e LRI and pic ed up t&e default(&dfs8//local&ost( as speci*ed in core(site.xml.

    b hadoop fs cop(From'ocal input3docs32uangle.t5t 3user3tom32uangle.t5t

    %e could also &a e used a relati e pat&( and copied t&e *le to our &o"e directory in H CS( &ic& in t&is case is /user/tom&

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    126/130

    b hadoop fs cop(From'ocal input3docs32uangle.t5t 2uangle.t5t

    DetKs copy t&e *le 'ac to t&e local *lesyste" and c&ec &et&er itKs t&e sa"e8

    b hadoop fs cop(6o'ocal 2uangle.t5t 2uangle.cop(.t5t b mdO input3docs32uangle.t5t 2uangle.cop(.t5t

    $age Q#

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    127/130

    5 +input/docs/ uangle.t t- Val#f2?lda#'05e2'aQa?? ?20eQdacd 5 + uangle.copy.t t-V a1#f2?1da#'05e2'aQa?? ?20eQdacd

    ,&e 5 digests are t&e sa"e( s&o ing t&at t&e *le sur i ed its trip to H CS and is 'ac intact. Cinally( letKs loo at an H CS file listing. %e create a directory first just to see &o it is displayed in t&e listing8

    b hadoop fs m1dir boo1s b hadoop fs ls .

    Cound 2 ite"s

    dr r r to" supergroup 0 200 0 02 228 1 /user/to"/'oo s

    r rr 1 to" supergroup 11 200 0 02 2282 /user/to"/ uangle.t t

    ,&e infor"ation returned is ery si"ilar to t&e Lni co""and ls l( it& a fe "inor differences. ,&e *rst colu"n s&o s t&e *le "ode. ,&e second colu"n is t&e replication factor of t&e *le +so"et&ing a traditional Lni filesyste"s does not &a e-. Re"e"'er e set t&edefault replication factor in t&e site ide configuration to 'e 1( &ic& is &y e see t&e sa"e

    alue &ere. ,&e entry in t&is colu"n is e"pty for directories since t&e concept of replicationdoes not apply to t&e"Tdirectories are treated as "etadata and stored 'y t&e na"enode(not t&e datanodes. ,&e t&ird and fourt& colu"ns s&o t&e *le o ner and group. ,&e fift&colu"n is t&e si4e of t&e *le in 'ytes( or 4ero for direc tories. ,&e si and se ent& colu"nsare t&e last "odi*ed date and ti"e. Cinally( t&e eig&t& colu"n is t&e a'solute na"e of t&efile or directory.

    File Permissions in < F

    H CS &as a per"issions "odel for *les and directories t&at is "uc& li e $7SIP. ,&ere aret&ree types of per"ission8 t&e read per"ission +r-( t&e rite per"ission + - and t&e e ecute

    per"ission + -. ,&e read per"ission is re uired to read *les or list t&e contents of a directory.

    ,&e rite per"ission is re uired to rite a *le( or for a directory( to create or delete *les or directories in it.

    ,&e e ecute per"ission is ignored for a *le since you canKt e ecute a *le on H CS +unli e $7SIP-( and for a directory it is re uired to access its c&ildren.

    Eac& *le and directory &as an owner, a group, and a mode. ,&e "ode is "ade up of t&e

    per"issions for t&e user &o is t&e o ner( t&e per"issions for t&e users &o are "e"'ers of t&e group( and t&e per"issions for users &o are neit&er t&e o ner nor "e"'ers of t&egroup.

    A clientKs identity is deter"ined 'y t&e userna"e and groups of t&e process it is runningin. Because clients are re"ote( t&is "a es it possi'le to 'eco"e an ar'itrary user( si"ply 'y creating an account of t&at na"e on t&e re"ote syste".

    ,&us( per"issions s&ould 'e used only in a cooperati e co""unity of users( as a "ec&anis"for s&aring *lesyste" resources and for a oiding accidental data loss( and not for securingresources in a &ostile en iron"ent. Ho e er( despite t&ese dra 'ac s( it is ort& &ile

    &a ing per"issions ena'led +as it is 'y default= see t&e dfs.per"issions property-( to a oidaccidental "odification or deletion of su'stantial parts of t&e *lesyste"( eit&er 'y users or 'y auto"ated tools or progra"s.

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    128/130

    %&en per"issions c&ec ing is ena'led( t&e o ner per"issions are c&ec ed if t&e clientKsuserna"e "atc&es t&e o ner( and t&e group per"issions are c&ec ed if t&e client is a"e"'er of t&e group= ot&er ise( t&e ot&er per"issions are c&ec ed.

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    129/130

    ,&ere is a concept of a superuser( &ic& is t&e identity of t&e na"enode process. $er"issions c&ec s are not perfor"ed for t&e superuser.

  • 8/10/2019 e-Com Notes (Chapters 6-9).rtf

    130/130

    understand the concept and derive from that an understanding of unloc1ing old data. Fundamentall( these technologies allow for mining data that otherwise could not be mined, simpl( b( supporting distributed computing and storage. )lso, use the slides to %gure out how a ap!educe program wor1s and the various entities involved.