dsc.soic.indiana.edudsc.soic.indiana.edu/publications/cloudcomputingtopics.pdf · cloud computing 1...

CLOUDCOMPUTING

GregorvonLaszewski

(c)GregorvonLaszewski,2018,2019

CLOUDCOMPUTING

1PREFACE1.1LearningObjectives☁1.2ePubReaders☁1.3Corrections☁1.4Contributors☁1.5Notation☁1.5.1Figures1.5.2Hyperlinksinthedocument1.5.3Equations1.5.4Tables

1.6Updates☁2OVERVIEW☁3DEFINITIONOFCLOUDCOMPUTING☁3.1DefiningthetermCloudComputing3.2HistoryandTrends3.3JobasaCloud/DataEngineer3.4YoumustbethatTALLL

4DATACENTER4.1DataCenter☁4.1.1Motivation:Data4.1.1.1Howmuchdata?

4.1.2CloudDataCenters4.1.3DataCenterInfrastructure4.1.4DataCenterCharacteristics4.1.5DataCenterMetrics4.1.5.1DataCenterEnergyCosts4.1.5.2DataCenterCarbonFootprint4.1.5.3DataCenterOperationalImpact4.1.5.4PowerUsageEffectiveness4.1.5.5Hot-ColdAisle4.1.5.5.1Containment4.1.5.5.1.1WaterCooledDoors

4.1.5.6WorkloadMonitoring4.1.5.6.1WorkloadofHPCintheCloud

4.1.5.6.2ScientificImpactMetric4.1.5.6.3CloudsandVirtualMachineMonitoring4.1.5.6.4WorkloadofContainers

4.1.6ExampleDataCenters4.1.6.1AWS4.1.6.2Azure4.1.6.3Google4.1.6.4IBM4.1.6.5XSEDE4.1.6.5.1Comet4.1.6.5.2Jetstream

4.1.6.6ChameleonCloud4.1.6.7IndianaUniversity4.1.6.8ShippingContainers

4.1.7ServerConsolidation4.1.8DataCenterImprovementsandConsolidation4.1.9ProjectNatick4.1.10RenewableEnergyforDataCenters4.1.11SocietalShiftTowardsRenewables4.1.12DatacenterRisksandIssues4.1.13Exercises

5ARCHITECTURE5.1Architectures☁5.1.1EvolutionofComputeArchitectures5.1.1.1MainframeComputing5.1.1.2PCComputing5.1.1.3IntranetandServerComputing5.1.1.4GridComputingComputing5.1.1.5InternetComputing5.1.1.6CloudComputing5.1.1.7MobileComputing5.1.1.8InternetofThingsComputing5.1.1.9EdgeComputing5.1.1.10FogComputing

5.1.2AsaServiseArchitectureModel5.1.3ProductorFunctionalBasedModel5.1.4NISTCloudArchitecture

5.1.5CloudSecurityAllianceReferenceArchitecture5.1.6MulticloudArchitectures5.1.6.1CloudmeshArchitecture

5.1.7Resources5.2NISTBigDataRefereneceArchitecture☁5.2.1PathwaytotheNIST-BDRA5.2.2BigDataCharacteristicsandDefinitions5.2.3BigDataandtheCloud5.2.4BigData,EdgeComputingandtheCloud5.2.5ReferenceArchitecture5.2.6FrameworkProviders5.2.7ApplicationProviders5.2.8Fabric5.2.9Interfacedefinitions

5.3TheY-SchedulingArchitectureView☁6REST6.1IntroductiontoREST☁6.1.0.1CollectionofResources6.1.0.2SingleResource6.1.0.3RESTToolClassification

6.2OPENAPI3.06.2.1RESTSpecifications☁6.2.1.1OPENAPI6.2.1.1.1OpenAPI3.0Specification(OAS3.0)6.2.1.1.1.1Definitions

6.2.1.2RAML6.2.1.3APIBlueprint6.2.1.4JsonAPI6.2.1.5Tinyspec6.2.1.6Tools6.2.1.6.1Connexion

6.2.2OpenAPI3.0RESTServiceviaIntrospection☁6.2.2.1Verification6.2.2.2Swagger-UI6.2.2.3Mockservice6.2.2.4Exercise

6.2.3RESTAIservicesExample☁

6.2.3.1ServiceEndpoints/Paths6.2.3.1.1Pathkmeans/upload6.2.3.1.2Pathkmeans/fit6.2.3.1.3Pathkmeans/predict

6.2.3.2Files6.2.3.3Runningtheexample6.2.3.4Notes

6.3FlaskRESTfulServices☁6.4DjangoRESTFramework☁6.5GithubRESTServices☁6.5.1Issues6.5.2Exercise

6.6OpenAPIRESTServiceswithSwagger☁6.6.1SwaggerTools6.6.2SwaggerCommunityTools6.6.2.1ConvertingJsonExamplestoOpenAPIYAMLModels

6.7RESTWITHEVE6.7.1RestServiceswithEve☁6.7.1.1UbuntuinstallofMongoDB6.7.1.2macOSinstallofMongoDB6.7.1.3Windows10InstallationofMongoDB6.7.1.4DatabaseLocation6.7.1.5Verification6.7.1.6BuildingasimpleRESTService6.7.1.7InteractingwiththeRESTservice6.7.1.8CreatingRESTAPIEndpoints6.7.1.9RESTAPIOutputFormatsandRequestProcessing6.7.1.10RESTAPIUsingaClientApplication6.7.1.11Towardscmd5extensionstomanageeveandmongo

6.7.2HATEOAS☁6.7.2.1Filtering6.7.2.2PrettyPrinting6.7.2.3XML

6.7.3ExtensionstoEve☁6.7.3.1ObjectManagementwithEveandEvegenie6.7.3.1.1Installation6.7.3.1.2Startingtheservice

6.7.3.1.3Creatingyourownobjects6.8OPENAPI2.06.8.1OpenAPI2.0Specification☁6.8.1.1TheVirtualClusterexampleAPIDefinition6.8.1.1.1Terminology6.8.1.1.2Specification

6.8.1.2References6.8.2OpenAPIRESTServiceviaIntrospection☁6.8.2.1Verification6.8.2.2Mockservice6.8.2.3Exercise

6.8.3OpenAPIRESTServiceviaCodegen☁6.8.3.1Step1:DefineYourRESTService6.8.3.2Step2:ServerSideStubCodeGenerationandImplementation6.8.3.2.1SetuptheCodegenEnvironment6.8.3.2.2GenerateServerStubCode6.8.3.2.3Fillintheactualimplementation

6.8.3.3Step3:InstallandRuntheRESTService:6.8.3.3.1Startavirtualenv:6.8.3.3.2Makesureyouhavethelatestpip:6.8.3.3.3Installtherequirementsoftheserversidecode:6.8.3.3.4Installtheserversidecodepackage:6.8.3.3.5Runtheservice6.8.3.3.6Verifytheserviceusingawebbrowser:

6.8.3.4Step4:GenerateClientSideCodeandVerify6.8.3.4.1Clientsidecodegeneration:6.8.3.4.2Installtheclientsidecodepackage:6.8.3.4.3UsingtheclientAPItointeractwiththeRESTservice

6.8.3.5TowardsaDistributedClientServer6.9Exercises☁

7GRAPHQL☁7.1Prerequisites7.1.1InstallGraphene7.1.2InstallDjango7.1.3InstallGraphiQL

7.2GraphQLtypesystemandschema7.2.1TypeSystem

7.2.2ScalarTypes7.2.3EnumerationTypes7.2.4Interfaces7.2.5UnionTypes

7.3GraphQLQuery7.3.1Fields7.3.2Arguments7.3.3Fragments7.3.4Variables7.3.5Directives7.3.6Mutations7.3.7QueryValidation

7.4GraphQLinPython7.5DevelopingyourownGraphQLServer7.5.1GraphQLserverimplementation7.5.2GraphQLServerQuerying7.5.3Mutationexample7.5.4GraphQLAuthentication7.5.5JSONWebTokenAuthentication7.5.5.1UsingAuthenticationwithCurl7.5.5.2ExpirationofJWTtokens

7.5.6GitHubAPIv47.6DynamicQuerieswithGraphQL7.7AdvantagesofUsingGraphQL7.8DisadvantagesofUsingGraphQL7.9Conclusion7.9.1Resources

7.10Excersises8HYPERVISOR8.1Virtualization☁8.1.1VirtualMachines8.1.2SystemVirtualMachines8.1.3HostedVirtualization8.1.4Summary8.1.5VirtualizationApproches8.1.5.1Fullvirtualization8.1.5.2Paravirtualization

8.1.6VirtualizationTechnologies8.1.6.1SelectedHardwareVirtualizationTechnologies8.1.6.2AMD-VandIntel-VT8.1.6.3I/OMMUvirtualization(AMD-ViandIntelVT-d)8.1.6.4SelectedVMVirtualizationSoftwareandTools8.1.6.4.1Libvirt8.1.6.4.2QEMU8.1.6.4.3KVM8.1.6.4.3.1KVMvsQEMU

8.1.6.4.4Xen8.1.6.4.5Hyper-V8.1.6.4.6VMWare

8.1.6.5Parallels8.1.6.5.1VirtualBox8.1.6.5.2Wine–Wineisnotanemulator8.1.6.5.3Comparisonofsometechnologies

8.1.6.6SelectedStorageVirtualizationSoftwareandTools8.1.6.7SelectedNetworkVirtualizationSoftwareandTools

8.2VirtualMachineManagementwithQEMU☁8.2.1InstallQEMU8.2.2CreateaVirtualHardDiskwithQEMU8.2.3InstallUbuntuontheVirtualHardDisk8.2.4StartUbuntuwithQEMU8.2.5EmulateRaspberryPiwithQEMU8.2.6Resources

8.3ManageVMguestswithvirsh☁9IAAS9.1Introduction☁9.2AmazonWebServices☁9.2.1AWSProducts9.2.1.1VirtualMachineInfrastructureasaServices9.2.1.2ContainerInfrastructureasaService9.2.1.3ServerlessComputeusingAWSLambda9.2.1.4ServerlessComputeusingAWSLambda9.2.1.5Storage9.2.1.6Databases

9.2.2Locations

9.2.3Creatinganaccount9.2.4AWSCommandLineInterface9.2.4.1Introduction9.2.4.2Prerequisites9.2.4.2.1InstallCLI9.2.4.2.2ConfigureCLI

9.2.5AWSAdminAccess9.2.5.1Introduction9.2.5.2Prerequisites9.2.5.3SettingupadminaccessusingAWSCLI9.2.5.3.1Createanadminsecuritygroup9.2.5.3.2Assignasecuritypolicytothecreatedgroupgrantingfulladminaccess

9.2.6Understandingthefreetier9.2.7ImportantNotes9.2.8IntroductiontotheAWSconsole9.2.8.1StartingaVM9.2.8.1.1Settingupkeypair

9.2.8.2StoppingaVM9.2.9AccessfromtheCommandLine9.2.10AccessfromPython9.2.11Boto9.2.12libcloud

9.3MicrosoftAzure☁9.3.1Products9.3.1.1VirtualMachineInfrastructureasaServices9.3.1.2ContainerInfrastructureasaService9.3.1.3Databases9.3.1.4Networking

9.3.2Registration9.3.3IntroductiontotheAzurePortal9.3.4CreatingaVM9.3.5CreateaUbuntuServer18.04LTSVirtualMachineinAzure9.3.6RemoteaccesstheVirtualMachine9.3.7StartingaVM9.3.8StoppingtheVM9.3.9Exercises

9.4WhatisIBMWatsonandwhyisitimportant?☁9.4.1HowcanweuseWatson?9.4.2Creatinganaccount9.4.3Understandingthefreetier

9.5GoogleIaaSCloudServices☁9.5.1CloudComputingServicesandProducts9.5.1.1Overview9.5.1.2AIandMachineLearning9.5.1.3APImanagement9.5.1.4Compute9.5.1.5DataAnalytics9.5.1.6Databases9.5.1.7DeveloperTools9.5.1.8InternetofThings9.5.1.9ManagementTools9.5.1.10MediaandMigration

9.5.2Migration9.5.2.1Networking9.5.2.2Security9.5.2.3Storage9.5.2.4GoogleIaaSExample9.5.2.5GoogleCloudConsoleOverview9.5.2.6UseGCPResources9.5.2.7Projectnavigation9.5.2.8NavigateGoogleCloudServices9.5.2.9Sectionpinning9.5.2.10ViewactivityacrossyourGCPresources9.5.2.11SearchacrossCloudConsole9.5.2.12Getsupportanytime9.5.2.13Manageusersandpermissions9.5.2.14Accessthecommandlinefromyourbrowser

9.5.3CreateaVMExample9.5.3.1Createavirtualmachineinstance9.5.3.2VMinstancespage9.5.3.3Connecttoyourinstance9.5.3.4Runasimplewebserver9.5.3.5Visityourapplication

9.5.3.6Cleanup9.6OpenStack☁9.6.1Introduction9.6.2OpenStackArchitecture9.6.3Components9.6.4CoreServices9.6.4.1Nova-Compute9.6.4.2Glance-ImageServices9.6.4.3Swift-ObjectStorage9.6.4.4Cinder-BlockStorage9.6.4.5Neutron-Networking9.6.4.6Horizon-Dashboard9.6.4.7Keystone-IdentityService9.6.4.8Ceilometer-Telemetry9.6.4.9Heat-Orchestration

9.6.5AccessfromPythonandScripts9.6.5.1Libcloud9.6.5.2DevStack

9.7PythonLibcloud☁9.7.1Servicecategories9.7.1.0.1Compute9.7.1.0.2KeyPairManagement9.7.1.0.3BlockStorage

9.7.2Installation9.7.3QuickExample9.7.4Managingyourcloudcredentials9.7.5Workingwithcloudservices9.7.5.1Authenticatingwithcloudproviders9.7.5.1.1AmazonAWS9.7.5.1.2Azure9.7.5.1.2.1AzureClassicDriver9.7.5.1.2.2AzureNewDriver

9.7.5.1.3OpenStack9.7.5.1.4Google

9.7.5.2Invokingservices9.7.5.2.1CreatingNodes9.7.5.2.2ListingNodes

9.7.5.2.3StartingNodes9.7.5.2.4StopingNodes

9.7.6CloudmeshCommunityProgramtoManageClouds9.7.7AmazonSimpleStorageServiceS3vialibcloud9.7.7.1Accesskey9.7.7.2CreateanewbucketonAWSS39.7.7.3ListContainers9.7.7.4Listcontainerobjects9.7.7.5Uploadafile9.7.7.6References

9.8AWSBoto ☁9.8.1Botoversions9.8.2BotoInstallation9.8.3Accesskey9.8.4Botoconfiguration9.8.5Botoconfigurationwithcloudmesh9.8.6EC2interfaceofBoto9.8.6.0.1Createconnection

9.8.7ListEC2instances9.8.7.0.1Launchanewinstance9.8.7.0.2Checkrunninginstances9.8.7.0.3Stopinstance9.8.7.0.4Terminateinstance9.8.7.1Rebootinstances

9.8.8AmazonS3interfaceofBoto9.8.8.0.1Createconnection9.8.8.0.2CreatenewbucketinS39.8.8.0.3Uploaddata9.8.8.0.4Listallbuckets9.8.8.0.5Listallobjectsinabucket9.8.8.0.6Deleteobject9.8.8.0.7Deletebucket

9.8.9References9.8.10Excersises

10MAPREDUCE10.1IntroductiontoMapreduce☁10.1.1MapReduceAlgorithm

10.1.1.1MapReduceExample:WordCount10.1.2HadoopMapReduceandHadoopSpark10.1.2.1ApacheSpark10.1.2.2HadoopMapReduce10.1.2.3KeyDifferences

10.1.3References10.2HADOOP10.2.1Hadoop☁10.2.1.1HadoopandMapReduce10.2.1.2HadoopEcoSystem10.2.1.3HadoopComponents10.2.1.4HadoopandtheYarnResourceManager10.2.1.5PageRank

10.2.2InstallationofHadoop☁10.2.2.1Releases10.2.2.2Prerequisites10.2.2.3UserandUserGroupCreation10.2.2.4ConfiguringSSH10.2.2.5InstallationofJava10.2.2.6InstallationofHadoop10.2.2.7HadoopEnvironmentVariables

10.2.3HadoopDistributedFileSystem(HadoopHDFS)☁10.2.3.1Introduction10.2.3.2Features10.2.3.3HDFSComponents10.2.3.3.1NameNodeandDataNodes

10.2.3.4Usage10.2.3.4.1JavaClientAPI10.2.3.4.2FSShell

10.2.3.5References10.2.3.6Exercises

10.2.4ApacheHBase☁10.2.4.1Introduction10.2.4.2Features10.2.4.3Configuration10.2.4.4Usage10.2.4.4.1ConnecttoHBase.

10.2.4.4.2Createatable10.2.4.4.3Describeatable10.2.4.4.4HBaseMapReducejob

10.2.4.5References10.2.5HadoopVirtualClusterInstallationUsingCloudmesh ☁10.2.5.1CloudmeshClusterInstallation10.2.5.1.1CreateCluster10.2.5.1.2CheckCreatedCluster10.2.5.1.3DeleteCluster

10.2.5.2HadoopClusterInstallation10.2.5.2.1CreateHadoopCluster10.2.5.2.2DeleteHadoopCluster

10.2.5.3AdvancedTopicswithHadoop10.2.5.3.1HadoopVirtualClusterwithSparkand/orPig10.2.5.3.2WordCountExampleonSpark

10.3SPARK10.3.1SparkLectures☁10.3.1.1MotivationforSpark10.3.1.2SparkRDDOperations10.3.1.3SparkDAG10.3.1.4Sparkvs.otherFrameworks

10.3.2InstallationofSpark☁10.3.2.1Prerequisites10.3.2.2InstallationofJava10.3.2.3InstallSparkwithHadoop10.3.2.4SparkEnvironmentVariables10.3.2.5TestSparkInstallation10.3.2.6InstallSparkWithCustomHadoop10.3.2.7ConfiguringHadoop10.3.2.8TestSparkInstallation

10.3.3SparkStreaming☁10.3.3.1StreamingConcepts10.3.3.2SimpleStreamingExample10.3.3.3SparkStreamingForTwitterData10.3.3.3.1Step110.3.3.3.2Step210.3.3.3.3Step3

10.3.3.3.4Step410.3.3.3.5step510.3.3.3.6step6

10.3.4UserDefinedFunctionsinSpark☁10.3.4.1Resources10.3.4.2InstructionsforSparkinstallation10.3.4.2.1Linux

10.3.4.3Windows10.3.4.4MacOS10.3.4.5InstructionsforcreatingSparkUserDefinedFunctions10.3.4.5.1Example:Temperatureconversion10.3.4.5.1.1Descriptionaboutdataset10.3.4.5.1.2HowtowriteapythonprogramwithUDF10.3.4.5.1.3Howtoexecuteapythonsparkscript10.3.4.5.1.4Filteringandsorting

10.3.4.6Instructionstoinstallandruntheexampleusingdocker10.4HADOOPECOSYSTEM10.4.1ELASTICMAPREDUCE10.4.1.1AWSElasticMapReduce(AWSEMR)☁10.4.1.1.1Introduction10.4.1.1.2WhyEMR?10.4.1.1.3UnderstandingClustersandNodes10.4.1.1.4Prerequisites10.4.1.1.5CreatingEMRClusterUsingCLI10.4.1.1.5.1CreateSecurityRoles10.4.1.1.5.2Settingupauthentication10.4.1.1.5.3Determinetheapplicablesubnet10.4.1.1.5.4CreatetheEMRcluster10.4.1.1.5.5Checkthestatusofyourcluster10.4.1.1.5.6Terminateyourcluster

10.4.1.1.6CreatingEMRClusterUsingAWSWebConsole10.4.1.1.6.1Setupauthentication10.4.1.1.6.2CreatetheEMRcluster10.4.1.1.6.3ViewstatusandterminateEMRcluster10.4.1.1.6.4SubmitWorktoaCluster10.4.1.1.6.5ProcessingData

10.4.1.1.7AWSStorage

10.4.1.1.8CreateEMRinAWS10.4.1.1.8.1Createthebuckets10.4.1.1.8.2CreateKeyPairs

10.4.1.1.9CreateStepExecution–HadoopJob10.4.1.1.10CreateaHiveCluster10.4.1.1.10.1CreateaHiveCluster-Screenshots

10.4.1.1.11CreateaSparkCluster10.4.1.1.11.1CreateaSparkCluster-Screenshots

10.4.1.1.12RunanexampleSparkjobonanEMRcluster10.4.1.1.12.1SparkJobDescription10.4.1.1.12.2CreatingtheS3bucket10.4.1.1.12.3CopyfilestoS310.4.1.1.12.4ExecutetheSparkjobonarunningcluster10.4.1.1.12.5ExecutetheSparkjobwhilecreatingclusters10.4.1.1.12.6ViewtheresultsoftheSparkjob

10.4.1.1.13Conclusion10.4.2TWISTER10.4.2.1Twister2☁10.4.2.1.1Introduction10.4.2.1.2Twister2API’s10.4.2.1.2.1TSetAPI10.4.2.1.2.2TaskAPI

10.4.2.1.3OperatorAPI10.4.2.1.3.1Resources

10.4.2.2Twister2Installation☁10.4.2.2.1Prerequisites10.4.2.2.1.1MavenInstallation10.4.2.2.1.2OpenMPIInstallation10.4.2.2.1.3InstallExtras10.4.2.2.1.4CompilingTwister210.4.2.2.1.5Twister2Distribution

10.4.2.3Twister2Examples☁10.4.2.3.1SubmittingaJob10.4.2.3.2BatchWordCountExample

10.4.3HADOOPRDMA☁10.4.3.1 Launching a Virtual Hadoop Cluster on Bare-metalInfiniBandNodeswithSR-IOVonChameleon

10.4.3.2LaunchingVirtualMachinesManually10.4.3.3ExtraInitializationwhenLaunchingVirtualMachines10.4.3.4 Important Note for Tearing Down Virtual Machines andDeletingNetworkPorts

11CONTAINER11.1IntroductiontoContainers☁11.1.1Motivation-Microservices11.1.2Motivation-ServerlessComputing11.1.3Docker11.1.4DockerandKubernetes

11.2DOCKER11.2.1IntroductiontoDocker☁11.2.1.1DockerEngine11.2.1.2DockerArchitecture11.2.1.3DockerSurvey

11.2.2RunningDockerLocally☁11.2.2.1InstillationforOSX11.2.2.2InstallationforUbuntu11.2.2.3InstallationforWindows1011.2.2.4TestingtheInstall

11.2.3Dockerfile☁11.2.3.1Specification11.2.3.2References

11.2.4DockerHub☁11.2.4.1CreateDockerIDandLogIn11.2.4.2SearchingforDockerImages11.2.4.3PullingImages11.2.4.4CreateRepositories11.2.4.5PushingImages11.2.4.6Resources

11.2.5DockerCompose☁11.2.5.1Introduction11.2.5.2Installation11.2.5.2.1InstallonMacOS11.2.5.2.2InstallonLinux11.2.5.2.3InstallonWindows1011.2.5.2.3.1SystemRequirements

11.2.5.2.4Testtheinstallation11.2.5.3DockerComposeFileDirectives11.2.5.3.1Configuration11.2.5.3.1.1build11.2.5.3.1.2context11.2.5.3.1.3ARGS11.2.5.3.1.4command11.2.5.3.1.5depends_on11.2.5.3.1.6image11.2.5.3.1.7ports11.2.5.3.1.8volumes

11.2.5.4Usages11.2.5.4.1BuildAServicedependingonMongoDB

11.3DOCKERPAAS11.3.1DockerClusters☁11.3.2DockerSwarm☁11.3.2.1Terminology11.3.2.2CreatingaDockerSwarmCluster11.3.2.3CreateaSwarmClusterwithVirtualBox11.3.2.4InitializetheSwarmManagerNodeandAddWorkerNodes11.3.2.5Deploytheapplicationontheswarmmanager

11.3.3DockerandDockerSwarmonFutureSystems☁11.3.3.1GettingAccess11.3.3.2Creatingaserviceanddeploytotheswarmcluster11.3.3.3Createyourownservice11.3.3.4Publishanimageprivatelywithintheswarmcluster11.3.3.5Exercises

11.3.4HadoopwithDocker☁11.3.4.1BuildingHadoopusingDocker11.3.4.2HadoopConfigurationFiles11.3.4.3VirtualMemoryLimit11.3.4.4hdfsSafemodeleavecommand11.3.4.5Examples11.3.4.5.1StatisticalExamplewithHadoop11.3.4.5.1.1BaseLocation11.3.4.5.1.2InputFiles11.3.4.5.1.3Compilation

11.3.4.5.1.4ArchivingClassFiles11.3.4.5.1.5HDFSforInput/Output11.3.4.5.1.6RunProgramwithaSingleInputFile11.3.4.5.1.7ResultforSingleInputFile11.3.4.5.1.8RunProgramwithMultipleInputFiles11.3.4.5.1.9ResultforMultipleFiles

11.3.4.5.2Conclusion11.3.4.6Refernces

11.3.5DockerPagerank☁11.3.5.1Usetheautomatedscript11.3.5.2Compileandrunbyhand

11.3.6ApacheSparkwithDocker☁11.3.6.1PullImagefromDockerRepository11.3.6.2RunningtheImage11.3.6.2.1Runninginteractively11.3.6.2.2Runninginthebackground

11.3.6.3RunSpark11.3.6.3.1RunSparkinYarn-ClientMode11.3.6.3.2RunSparkinYarn-ClusterMode

11.3.6.4ObserveTaskExecutionfromRunningLogsofSparkPi11.3.6.5WriteaWord-CountApplicationwithSparkRDD11.3.6.5.1LaunchSparkInteractiveShell11.3.6.5.2PrograminScala11.3.6.5.3LaunchPySparkInteractiveShell11.3.6.5.4PrograminPython

11.3.6.6DockerSparkExamples11.3.6.6.1K-MeansExample11.3.6.6.2JoinExample11.3.6.6.3WordCount

11.3.6.7InteractiveExamples11.3.6.7.1StopDockerContainer11.3.6.7.2StartDockerContainerAgain11.3.6.7.3RemoveDockerContainer

11.4KUBERNETES11.4.1IntroductiontoKubernetes☁11.4.1.1Whatarecontainers?11.4.1.2Terminology

11.4.1.3KubernetesArchitecture11.4.1.4Minikube11.4.1.4.1Installminikube11.4.1.4.2StartaclusterusingMinikube11.4.1.4.3Createadeployment11.4.1.4.4Exposetheservi11.4.1.4.5Checkrunningstatus11.4.1.4.6Callserviceapi11.4.1.4.7TakealookfromDashboard11.4.1.4.8Deletetheserviceanddeployment11.4.1.4.9Stopthecluster

11.4.1.5InteractiveTutorialOnline11.4.2UsingKubernetesonFutureSystems☁11.4.2.1GettingAccess11.4.2.2ExampleUse11.4.2.3Exercises

11.5RunningSingularityContainersonComet☁11.5.1Background11.5.2TutorialContents11.5.3WhySingularity?11.5.4Hands-OnTutorials11.5.5Downloading&InstallingSingularity11.5.5.1Download&UnpackSingularity11.5.5.2Configure&BuildSingularity11.5.5.3Install&TestSingularity

11.5.6BuildingSingularityContainers11.5.6.1UpgradingSingularity

11.5.7CreateanEmptyContainer11.5.8ImportIntoaSingularityContainer11.5.9ShellIntoaSingularityContainer11.5.10WriteIntoaSingularityContainer11.5.11BootstrappingaSingularityContainer11.5.12RunningSingularityContainersonComet11.5.12.1TransfertheContainertoComet11.5.12.2RuntheContaineronComet11.5.12.3AllocateResourcestoRuntheContainer11.5.12.4IntegratetheContainerwithSlurm

11.5.12.5UseExistingCometContainers11.5.13UsingTensorflowWithSingularity11.5.14Runthejob11.5.15Resources☁11.5.15.1Tutorialspoint

11.6Exercises☁12SERVERLESS12.1FaaS☁12.1.1Introduction12.1.2ServerlessComputing12.1.3Faasprovider12.1.4Resources12.1.5UsageExamples

12.2AWSLambda☁12.2.1AWSLambdaFeatures12.2.2UnderstandingFunctionlimitations12.2.2.1ExecutionTime12.2.2.2Functionsize

12.2.3UnderstandingthefreeTier12.2.4WritingyourfistLambdafunction12.2.5AWSLambdaUsecases12.2.6AWSLambdaExample

12.3ApacheOpenWhisk☁12.3.1OpenWhiskWorkflow12.3.1.1TheActionandNginx12.3.1.2Controller:TheSystem’sInterface12.3.1.3CouchDB12.3.1.4LoadBalancer12.3.1.5Kafka12.3.1.6Invoker12.3.1.7CouchDBagain

12.3.2SettingUpOpenWhiskLocally12.3.2.1Debuggingquick-start

12.3.3HelloWorldinOpenWhisk12.3.4Creatingacustomaction

12.4Kubeless☁12.4.1Introduction

12.4.2Programingmodel12.4.3SystemArchitecture

12.5MicrosoftAzureFunction ☁12.6GoogleCloudFunctions☁12.6.1GoogleCloudFunctionExample

12.7OpenFaaS☁12.7.1OpenFaasComponentsandArchitecture12.7.1.1APIGateway12.7.1.2FunctionWatchdog12.7.1.3OpenFaasCLI12.7.1.4Monitoring

12.7.2OpenFaasinAction12.7.2.1Prerequistics12.7.2.2SingleNodeCluster12.7.2.3DeployOpenFaas12.7.2.4ToRunOpenFaas

12.7.3OpenFaaSFunctionwithPython12.8OpenLamda☁12.8.1SuggestedMaterials12.8.2Development12.8.3OpenLambda12.8.4GettingStarted12.8.4.1InstallDependencies12.8.4.2StartaTestCluster

12.8.5Administration12.8.5.1WritingHandlers12.8.5.2ClusterDirectory

12.8.6Configuration12.8.7Architecture

13MESSAGING13.1MQTT☁13.1.1Introduction13.1.2PublishSubscribeModel13.1.2.1Topics13.1.2.2Callbacks13.1.2.3QualityofService

13.1.3SecureMQTTServices

13.1.3.1UsingTLS/SSL13.1.3.2UsingOAuth

13.1.4IntegrationwithOtherServices13.1.5MQTTinProduction13.1.6Installation13.1.6.1MacOSinstall13.1.6.2MacOSAdvancedServiceinstall13.1.6.3Ubuntuinstall13.1.6.4RaspberryPiSetup13.1.6.4.1Broker13.1.6.4.2Client

13.1.7ServerUsecase13.1.8IoTUseCasewithaRaspberryPI13.1.8.1RequirementsandSetup13.1.8.2Results

13.1.9Conclusion13.1.10Exercises

13.2PythonApacheAvro☁13.2.1Download,UnzipandInstall13.2.2Definingaschema13.2.3Serializing13.2.4Deserializing13.2.5Resources

14GO14.1IntroductiontoGoforCloudComputing☁14.1.1Organizationofthechapter14.1.2References

14.2Installation☁14.3EditorsSupportingGo☁14.4GoLanguage☁14.4.1ConcurrencyinGo14.4.1.1GoRoutines(execution)14.4.1.2Channels(communication)14.4.1.3Select(coordination)

14.5Libraries☁14.6GoCMD☁14.6.1CMD

14.6.2DocOpts14.7GoREST☁14.7.1Gorilla14.7.2REST,RESTful14.7.3Router14.7.4Fullcode

14.8OpenAPI☁14.8.1InstallfromHomebrew14.8.2servespecificationUI14.8.3validateaspecification14.8.4GenerateaGoOpenAPIserver14.8.5generateaGoOpenAPIclient14.8.6generateaspecfromthesource14.8.7generateadatamodel14.8.8othereditors

14.9CreateanEchoserviceusingSwaggerandGo14.9.1Dependencies14.9.2InitializeaGolangproject14.9.3DefineAPIsandgeneratecodeinGo14.9.4Implementthefunctionality14.9.5Runandtesttheserver14.9.6References

14.10GoCloud☁14.10.1GolangOpenstackClient14.10.2OpenStackfromGo14.10.2.1GohperCloud14.10.2.1.1Authentication14.10.2.1.2Virtualmachines14.10.2.1.3Resources

14.11GoLinks☁14.11.1IntroductoryMaterial14.11.2TheGOLanguage14.11.3HowpopularisGo?14.11.4OpenAPIandGo

14.12Exercises☁15REFERENCES

1PREFACE

SatNov2305:21:29EST2019☁

1.1LEARNINGOBJECTIVES☁

LearningObjectives

LearnabouthowwedistributematerialasePub’s.LearnhowtocreateanePubwithourmaterialfromsource.IntroduceelementarynotationsweuseintheePub’s.SeewhocontributedtotheePub’s.

1.2EPUBREADERS☁This document is distributed in ePub format. Every OS has a suitable ePubreader to view the document. Such readers can also be integrated into aWebbrowser so thatwhenyouclickonanePub it is automaticallyopened inyourbrowser. As we use ePubs the document can be scaled based on the user’spreferenceIfyoueverseeacontentthatdoesnotfitonapagewerecommendyouzoomouttomakesureyoucanseetheentirecontent.

Wehavemadegoodexperienceswiththefollowingreaders:

macOSX:Books,whichisabuildinebookreaderWindows10:Microsoftedge,but itmustbe thenewestversion,asolderversionshavebugs.Alternatively,usecalibreLinux:calibre

IfyouhaveaniPadorTabletwithenoughmemory,youmayalsobeabletousethem.

Sometimes you may want to adjust the zoom of your reader to increase ordecrease it.Pleaseadjustyourzoomtoa level that iscomfortable foryou.OnmacOSwithalargermonitor,wefoundthatzoomingoutmultipletimesresults

https://github.com/cloudmesh-community/book/blob/master/chapters/version.mdhttps://github.com/cloudmesh-community/book/blob/master/chapters/preface/learning.mdhttps://github.com/cloudmesh-community/book/blob/master/chapters/preface/reader.mdhttps://www.apple.com/apple-bookshttps://www.microsoft.com/en-us/windows/microsoft-edgehttps://calibre-ebook.com/https://calibre-ebook.com/

inverygoodrenderingallowingyou tosee thesourcecodewithouthorizontalscrolling.

1.3CORRECTIONS☁Thematerialcollectedinthisdocumentismanagedin

https://github.com/cloudmesh-community/book/chapters

Incaseyouseeanerroror like tomakeacontributionofyourownsectionorchapter,youcandosoingithubviapullrequests.

Theeasiestway to fixanerror is to read theePubandclickon thecloudsymbolinaheadingwhereyouseetheerror.Thiswillbringyoutoaneditabledocumentingithub.Youcandirectlyfixtheerrorinthewebbrowserandcreatethereapullrequest.Naturally,youneedtobesignedintogithubbeforeyoucaneditandcreateapullrequest.

Asa result contributors andauthorswill be integratedautomaticallynext timewecompilethematerial.Thusevenifyoucorrectedasinglespellingerror,youwillbeacknowledged.

1.4CONTRIBUTORS☁Contributors are sorted by the first letter of their combined Firstname andLastnameandifnotavailablebytheirgithubID.Please,notethattheauthorsareidentifiedthroughgitlogsinadditiontosomecontributorsaddedbyhand.Thegit repository from which this document is derived contains more than thedocuments included in thisdocument.Thusnoteveryone in this listmayhavedirectlycontributedtothisdocument.Howeverifyoufindsomeonemissingthathascontributed(theymaynothaveusedthisparticulargit)pleaseletusknow.Wewilladdyou.Thecontributorsthatweareawareofinclude:

Anand Sriramulu, Ankita Rajendra Alshi, Anthony Duer, Arnav,AverillCate,Jr,BertoltSobolik,BoFeng,BradPope,Brijesh,DaveDeMeulenaere,De’AngeloRutledge,EliyahBenZayin,EricBower,Fugang Wang, Geoffrey C. Fox, Gerald Manipon, Gregor von

https://github.com/cloudmesh-community/book/blob/master/chapters/preface/corrections.mdhttps://github.com/cloudmesh-community/book/chaptershttps://github.com/cloudmesh-community/book/blob/master/chapters/authors.md

Laszewski, Hyungro Lee, Ian Sims, IzoldaIU, Javier Diaz, JeevanReddyRachepalli,JonathanBranam,JulietteZerick,KeithHickman,KeliFine,KennethJones,MallikChalla,ManiKagita,MiaoJiang,Mihir Shanishchara, Min Chen, Murali Cheruvu, Orly Esteban,Pulasthi Supun, Pulasthi Supun Wickramasinghe, Pulkit Maloo,Qianqian Tang, Ravinder Lambadi, Richa Rastogi, Ritesh Tandon,SaberSheybani,SachithWithana,SandeepKumarKhandelwal,SheriSanders, Shivani Katukota, Silvia Karim, Swarnima H. Sowani,Tharak Vangalapat, Tim Whitson, Tyler Balson, Vafa Andalibi,VibhathaAbeykoon,VineetBarshikar,YuLuo,ahilgenkamp,aralshi,azebrowski, bfeng, brandonfischer99, btpope, garbeandy,harshadpitkar, himanshu3jul, hrbahramian, isims1, janumudvari,joshish-iu, juaco77, karankotz, keithhickman08, kkp, mallik3006,manjunathsivan, niranda perera, qianqian tang, rajni-cs, rirasto,sahancha, shilpasingh21, swsachith, toshreyanjain, trawat87,tvangalapat,varunjoshi01,vineetb-gh,xianghangmi,zhengyili4321

1.5NOTATION☁Thematerialhereusesthefollowingnotation.Thisisespeciallyhelpful,ifyoucontributecontent,sowekeepthecontentconsistent.

ifyouliketoseethedetailsonhowtocreatetheminthemarkdowndocuments,youwillhavetolookatthefilesourcewhileclickingonthecloudintheheadingoftheNotationsection(Section1.5).Thiswillbringyoutothemarkdowntex,butyouwillstillhavetolookattherawcontenttoseethedetails.

☁or ![Github](images/github.png)

Ifyouclickonthe☁or inaheading,youcangodirectlytothe> document in github that contains the next content. This is >convenient to fixerrorsormakeadditions to thecontent.Thecloudwillbeautomaticallyaddedupon inclusionofanewmarkdown filethatincludesinitsfirstlineasectionheader.

$

https://github.com/cloudmesh-community/book/blob/master/chapters/preface/notation.mdhttps://raw.githubusercontent.com/cloudmesh-community/book/master/chapters/preface/notation.md

Contentinbashismarkedwithverbatimtextandadollarsign

[1]

References are indicatedwith a number and are included in the>referencechapter[1].Use it inmarkdownwith> [@las14cloudmeshmultiple].Referencesmustbeaddedtotherefernces.bibfileinBibTexformat.

or

Chaptersmarkedwiththisemojiarenotyetcompleteorhavesomeissuethatweknowabout.Thesechaptersneedtobefixed.Ifyouliketohelpusfixingthissection,pleaseletusknow.Useitinmarkdownwith:o2:orifyouliketousetheimagewith![No](images/no.png).

REST36:02

Example for a video with the ![Video](images/video.png) emoji. Use it inmarkdownwith[![Video](images/video.png)REST36:02](https://youtu.be/xjFuA6q5N_U)

Slides10

Example for slideswith the ![Presentation](images/presentation.png) emoji. Theseslidesmayormaynotincludeaudio.

Slides10

Slideswithoutanyaudio.Theymaybefastertodownload.Useit inmarkdownwith[![Presentation](images/presentation.png)Slides10](TBD).

Asetoflearningobjectiveswiththe![Learning](images/learning.png)emoji.

$Thisisabashtext

https://youtu.be/xjFuA6q5N_U

Asectionisreleasewhenitismarkedwiththisemojiinthesyllabus.Useitinmarkdownwith![Ok](images/ok.png).

Indicatesopportunities forcontributions.Use it inmarkdownwith ![Question](images/question.png).

Indicates sections that are worked on by contributors. Use it inmarkdownwith![Construction](images/construction.png).

Sectionsmarkedbythecontributorwiththisemoji![Smiley](images/smile.png)whentheyarereadytobereviewed.

Sectionsthatneedmodificationsareindicatedwiththisemoji![Comment](images/comment.png).

Awarningthatweneedtolookatinmoredetail![Warning](images/warning.png)

Notesareindicatedwithabulb![Idea](images/idea.png)

Otheremojis

Other emojis can be found athttps://gist.github.com/rxaviers/7360908. However, note that emojismaynotbeviewable inother formatsoronallplatforms.Weknowthatsomeemojisdonotshowincalibre,buttheydoshowinmacOS

https://gist.github.com/rxaviers/7360908

iBooksandMSEdge

This is the list of emojis that canbe converted to PDF. So if you like a PDF,pleaselimityouremojisto

:cloud:☁ :o2: :relaxed:☺ :sunny:☀ :baseball:⚾ :spades:♠ :hearts:♥ :clubs:♣ :diamonds: ♦:hotsprings:♨:warning:⚠:parking: :a: :b: :recycle:♻:copyright:©:registered:®:tm:™:bangbang:‼:interrobang:⁉:scissors:✂:phone:☎

1.5.1Figures

FigureshaveacaptionandcanberefereedtointheePubsimplewithanumber.WeshowsuchareferencepointerwhilereferringtoFigure1.

Figure1:Figureexample

Figuresmustbewritteninthemdas

Note that the textmustbe inone line andmustnotbebrokenupeven if it islongerthan80characters.Youcanrefertothemwith@fig:code-example.Pleasenoteinorderfornumberingtoworkfigurereferencesmustincludethe#fig:followedbya unique identifier. Please note that identifiersmust be really unique and thatidentifiessuchas#fig:cloudorsimilarsimpleidentifiersareapoorchoiceandwilllikelynotwork.Tocheck,pleaselistalllineswithanidentifiersuchas.

andseeifyouridentifieristrulyunique.

1.5.2Hyperlinksinthedocument

Tocreatehyperlinksinthedocumentotherthanimages,weneedtousepropermarkdownsyntaxinthesource.Thisisachievedwitharefernceforexamplein

![Figureexample](images/code.png){#fig:code-examplewidth=1in}

$grep-R"#fig:"chapters

sections headers. Let us discuss the refernce header for this section,e.g.Notation.Wehaveaugmentedthesectionheaderasfollows:#Notation{#sec:notation}

Nowwecanusetherefernceinthetextasfollows:In@sec:notationweexplain...

Itwillberenderedas:InSection1.5weexplain…

1.5.3Equations

Equationscanbewrittenas$$a^2+b^2=c^2$${#eq:pythagoras}

andusedintext:

a2 + b2 = c2 (1)

Itwillrenderas:AsweseeinEquation1.

Theequationnumberisoptional.Inlineequationsjustuseonedollarsignanddonotneedanequationnumber:ThisisthePythagorastheorem:$a^2+b^2=c^2$

Whichrendersas:

ThisisthePythagorastheorem:a2 + b2 = c2.

1.5.4Tables

Tablescanbeplacedintextasfollows:

Asusualmakesurethelabelisunique.Whencompilingitwillresultinanerrorif labels are not unique. Additionally there are several md table generators

:SampleDataTable{#tbl:sample-table}

xyz---------1234542

availableontheinternetandmakecreatingtablemoreefficient.

1.6UPDATES☁Asalldocumentsaremanagedingithub,thelistofupdatesisdocumentedinthecommithistoryat

https://github.com/cloudmesh-community/book/commits/master

IncaseyoudoalecturewithuswerecommendthatyoudownloadanewversionooftheePubeveryweek.Thiswayyouaretypicallystayinguptodate.YoucancheckthecommithistoryandidentifyiftheversionoftheePubisolderthanthecommittedversion,ifsowerecommendthatyoudownloadanewversion.

We typically will not make announcements to the class as theGitHub commit history is sufficient and you are responsible tomonitoritaspartofyourclassactivities.

https://github.com/cloudmesh-community/book/blob/master/chapters/preface/updates.mdhttps://github.com/cloudmesh-community/book/commits/master

2OVERVIEW☁

LearningObjectives

GainanoverviewwhatcurrentlyisinthisbookReviewthehighlevelgoalsBeawarethatthisbookisnotcompleteandisworkedonaswespeakBeawaretocheckoutthebookonaweeklybasistostayuptodateBe aware that additionalmaterial is distributed in separate books such asLinux,Python,andWritinginMarkdown.Beawarethatbooksyoumaypurchasemayalreadybeoutdatedbythetimeyouorderthem.

In thisbookweprovideanumberofchapters thatwillallowyoutoeasilygetknowledgeincloudcomputingontheoreticalandpracticallevels.

Althoughthefollowingwasoriginallycoveredinthisbook,wedecidedtosplitoutitscontentsastomakethecorecloudengineeringbooksmaller.Incaseyoutakeoneofourclassesusingthebook,weexpectthatyoupickupthematerialcoveredalsoby theseadditionalbooks.Pleasebeaware thatsomeof theclassmaterialisbasedonPythonandLinux.Youwillneednoknowledgeofthemasyoucanpickitupwhilereadingthisbook.

CloudComputingLinuxforCloudComputingPythonforCloudComputingScientificWritingwithMarkdown

Thebookisorganizedasfollows:

DefinitionofCloudComputing

Wewillstartwiththedefinitionofwhatcloudcomputingisandmotivatewhy it is important to not only know technologies such asAI orML or

https://github.com/cloudmesh-community/book/blob/master/chapters/class/516/overview.mdhttps://laszewski.github.io/book/cloud/https://laszewski.github.io/book/linux/https://laszewski.github.io/book/python/https://laszewski.github.io/book/writing/

Databases. We present you with evidence that Clouds are absolutelyrelevant to todays technologies.We see furthermore a trend to utilizeAIandMLservicesoninthecloud.Technologiessuchasvirtualmachineandcontainers and Function as a Service are essential to the repertoire of amodernCloudorDataengineer.ThereismorethanML…☺

DataCenter

Thischapterwillexplainyouwhyweneedclouddatacenters,howaclouddata center look likes andwhich environmental impact such data centershave.

Architecture

This chapter will introduce you to the basic architectural features anddesigns of cloud computing.We will discuss architectures for IaaS, andcontrastittootherarchitectures.WewilldiscusstheNISTdefinitionofthecloud and the Cloud Security Alliance Reference Architecture. We willdiscussthemulti-cloudarchitectureintroducedbycloudmeshaswellastheBigDataReferenceArchitecture.

REST

Thischapterwill introduceyou toawayonhowtodefineservices in thecloud that you can easily access via language independent clientAPIs. Itwill introduce you to the fundamental concepts of REST.We will moreimportantly introduce you to OpenAPI that allows you to specify RESTservices via a specification document so you can createAPIs and clientsform the document automatically.Wewill showcase you how to do thatwithflask.

Wewill showcase you on a very popular service such asGitHubhow toeasilyinterfacewithRESTservicesinPython.

GraphQL

In this chapter we will introduce you to GraphQL which allows you toaccessdata throughaquerylanguage.Itallowsclients toeasilyformulatequeries that retrieve desired data. Restrictions to the queries can be

formulated to download what is needed. Other features include a typesystem.GithubhasaddedinadditiontoitsRESTservicealsoaGraphQLinterface. You will have the opportunity to explore GraphQl whileinterfacingwithGitHub.

Hypervisors

Virtualization is one of the important technologies that started the cloudrevolution.Itprovidesthebasicunderlyingprinciplesforthedevelopmentandadoptionofclouds.Theconcept,althougholdandalreadyusedintheearly days of computing, has recently been exploited to lead to betterutilizationofserversaspartofdatacenters,butalsothelocaldesktops.

In thischapterwe introduceyou to thebasicconceptsanddistinguish thevariousformsofvirtualization.

WelistvirtualizationframeworkssuchasLibvirt,Qemu,KVM,Xen,andHyper-V. Dependent on your hardware you will be encouraged toexperimentwithoneormoreofthem.

IaaS

In the IaaSchapterwewillbe reviewingmanyof theservicesofferedbyproviders usch as AWS, Azure, Google, and OpenStack that is used bysomeacademiccloudssuchaschameleoncloud.

In additionwewill introduce you to elementary command line tools andprogramstoaccessthisinfrastructure.

Inthissectionwewillalsoprovideyouwithinformationaboutmulticloudmanagement with cloudmesh which makes it extremly easy to switchbetweenanduseservicesfrommultiplecloud.s

Importanttonoteisthattheappendixcontainsveryusefulinformationthataugments this section. This includes a more detailed list of services forsomeIaaSprovidersaswellasinformationonhowtousechameleoncloudwhichhasbeenadaptedbyusforthischapter.

Map/Reduce

InthischapterwediscussaboutthebackgroundofMapreducealongwithHadoopandit’scorecomponents.WewillalsointroduceSparktoyouinthissectiontoSpark.youinthissection.

Youwillbepresentedonhowyoucanusethesystemsonasingleresourcesoyoucanexplorethemmoreeasily,butwewillalsoletyouknowhowtoinstallthemonaclusterinprincipal.

We conclude this section with some important Map/Reduce frameworksused as part of the larger Map/Reduce ecosystem such as AWS ElasticMap/Reduce(AWSEMR).ThisalsoincludesadiscussionaboutTwister2which is a version of Map/Reduce that could perform even faster thenSpark.

Infactwehaveheretwosectionsthatneedtobedelineatedabitbetterwhichwehopewecandowithyourhelp.

Container

In the container chapterwewill introduce you to the basic concepts of acontaineranddelineateitfromvirtualmachinesaswehaveintroducedyouearlier.Wewill start thechapterwithan introduction toDockerand thanintroduceyouhowtomanageclusterscapableofrunningmanycontainerswiththehelpofdockerswarmandkubernetes.Toshowcaseyouitsuseonother PaaS and applicationswe even show you how to runHadoopwithdockeraswellashowtoconductapagerankanalysis.Kuberneteswillbediscussedinitsownsection.

As many academic datacenters do run queuing system, we will alsoshowcaseSingularityallowingyoutousecontainerswithinabatchqueuingsystem.

youwillhelpusimprovingthissectionifyouelecttoconductaprojectoncomet.

WeconcludethesectionwithlettingyouknowhowtorunTensorflowviasingularity,

ServerlessComputing

Recentlyanewparadigmincloudcomputinghasbeenintroduced.Insteadof using virtual machines or containers functions with limited resourcerequirements are specified that can than be executed on function capableexecutionserviceshostedbycloudproviders.

WewillintroduceyoutothisconceptandshowcaseyousomeexamplesofFaaSservicesandframeworks.

MessagingServices

Many devices in the cloud need to communicatewith each other. In thischapterwelookintohowwecanprovidealternativestoRESTservicesthatprovide messaging capabilities.We will focus onMQTT which is oftenusedtoconnectcloudedgedevicesbetweeneachotherandthecloud.

GO

GoisaprogramminglanguageusedbyGoogleandhasbeenmostfamouslyused to implement Kubernetes. In this chapter we introduce you to theelementaryfeaturesofGoandalsotakeacloserlookonhowwecandefineRESTservices,useOpenAPI,andinterfacewithclouds.

CloudAIServices

AspartoftheclasswewillbeexploringAIservicesthatarearehostedincloudandofferedasservice. If interestedyouwillbeable touse theminyourprojects.AspartofthisclassyouwillalsobedevelopingAIservicesand those can be hosted in the cloud and reused by others.While usingcross-platformspecifications,clientsforJava,Python,Scala,Go,andotherprogramming languages will be automatically created for you. This willallowotherstoreuseyourservices.

3DEFINITIONOFCLOUDCOMPUTING☁

LearningObjectives

Comparedifferentdefinitionsofcloudcomputing.ReviewtheHistoryofcloudcomputing.Identifytrends.Thecurrenthot job isdataengineerwhich issoughtaftermore thandatascientists(anewtrend).Youhavechosentherightcourse☺BeTALLLtobesuccessfulincloudcomputing.

Videos:

DefinitionofCloudComputing2019

3.1DEFININGTHETERMCLOUDCOMPUTING

In this presentation we review three definitions of cloud computing. Thisincludesthedefinitionsby

NISTWikipediaGartner

3.2HISTORYANDTRENDS

We review some of the historical aspects that lead to cloud computing andespecially look intomore recent trends.These trendsmotivate thatweneed tolook at enhancements to the traditional Service Model that includeInfrastructure-, Platform- and Software- as a Service. These enhancementsespeciallyaretargetingFunction-,andContainerasaService.

https://github.com/cloudmesh-community/book/blob/master/chapters/cloud/definition.mdhttps://youtu.be/KaQte-2elVo

3.3JOBASACLOUD/DATAENGINEER

Welookatsomejobrelated trends thatespeciallyfocuson thenewesthot jobdescription calledDataEngineer. It ismotivated that current job offerings asdata engineer is 13% versus 1% for data scientists. As this class is targetedtowards bringing the engineering component towards the data scientists,computer scientists, and application developer, This class is ideally suited forincreasingyourmarketability.

3.4YOUMUSTBETHATTALLL

We close this class with Gregor’s TALLL principle to succeed in CloudComputing:

YoumustbethatTALLLtosurviveinCloudComputingandBigData

Thisprincipleincludesthefollowingcharacteristics

TrendAwareness(TA)

Weneed tobeawarenotonlywhat iscurrentlya trend,butwhatwillbefuturetrends

LongevityPlanning(L)

We need to be able to reproduce our services and results (e.g. can wereproducethemstillinsixmonth).

LeapDetection(L)

WeneedtobeabletodealwithtechnologyLeaps

LearningWillingness(L)

We need to constantly learn to keep up as technology changes every 6month

Naturallythisprincipalcanbeappliedtootherdisciplines.

4DATACENTER

4.1DATACENTER☁

LearningObjectives

Whatisadatacenter.Whatareimportmetrics.What is the difference between a Cloud data center and a traditionaldatacenter.WhatareexamplesofClouddatacenters.

4.1.1Motivation:Data

Beforewegointomoredetailsofadatacenterweliketomotivatewhyweneedthem.Herewestartwithlookingattheamountofdatathatrecentlygotcreatedand provide one ofmanymotivational aspects.Not all datawill or should bestored in data centers. However a significant amount of data will be in suchcenters.

4.1.1.1Howmuchdata?

Oneoftheissueswehaveistocomprehendhowmuchdataiscreated.It’shardtoimagineandputintoaperspectivehowmuchtotaldataiscreatedoverayear,a month, a week, a day or even just an hour. Instead to easily visualize theamountofdataproducedweoftenfindgraphicseasiertocomprehendthatshowshow much data was generated in a minute. Such depictions usually includeexamplesofdatageneratedasapartofpopularcloudservicesortheinternetingeneral.

One such popular depiction isDataNever Sleeps (see Figure 3). It has beenproducedanumberoftimesovertheyearsandisnowatversion7.0releasedin2019.Ifyouidentifyanewerversion,pleaseletusknow.

https://github.com/cloudmesh-community/book/blob/master/chapters/cloud/datacenter.md

Observationsfor2019:Itisworthwhiletostudythisimageindetailandidentifysomeof thedata thatyoucanrelate toofserviceyouuse. It isalsoapossibleindicationtostudyotherservices thatarementioned.For thedatafor2019weobserve that a staggering ~4.5Mil google searches are executed everyminutewhich is slightly lower than thenumberofvideoswatchedonyoutube.18Miltextmessagesare sendeveryminute.Naturally thenumbersareaveragesovertime.

Figure2:DataNeverSleeps[2]

Incontrast in2017weobserved:A3.8Milgooglesearchesareexecutedeveryminute.Surprisingly theweatherchannelreceivesover18Milforecastrequestswhichisevenhigherthanthe12Miltextmessagessendeveryminute.Youtubecertainlyservingasignificantnumberofusersby4.3Milvideoswatchedeveryminute.

Figure3:DataNeverSleeps[3]

Adifferentsourcepublisheswhatishappeningontheinternetinaminute,butwehavebeenabletolocateaversionfrom2018(seeFigure4).Whilesomedataseemsthesame,othersareslightlydifferent.Forexamplethisgraphhasalowercount for Google searches, while the number of text messages send issignificantlyhigherincontrasttoFigure3.

Figure4:InternetMinute2018[4]

While reviewing the image from last year from the same author, we find notonlyincreases,butalsodeclines.Lookingatfacebookshowcasesalossof73000loginsperminute.Thislossissubstantial.Wecanseethatfacebookservicesarereplaced by other services that aremore popular with the younger generationwhotendtopickupnewservicesquickly(seeFigure5).

Figure5:InternetMinute2017-2018[4]

It is also interesting to compare such trends over a longer period of time (seeFigure6,Figure7).AnexampleisprovidedbylookingatGooglesearches

http://www.internetlivestats.com/google-search-statistics/.

andvisualizedinFigure6.

http://www.internetlivestats.com/google-search-statistics/

Figure6:Googlesearchesovertime

Figure7:Bigdatatrend.2012[5]

When looking at the trends,manypredict an exponential growth in data.Thistrendiscontinuing.

4.1.2CloudDataCenters

Adatacenterisafacilitythathoststheinformationtechnologyrelatedtoserversand data serving a large number of customers. data centers evolved from theneedtooriginallyhavelargeroomsastheoriginalcomputersfilledintheearlydays of the compute revolution filled rooms. Once multiple computers wereadded to such facilities super computer centers created for research purposes.WiththeintroductionoftheinternetandofferingservicessuchasWebhostinglarge business oriented server rooms were created. The need for increasedfacilitieswasevenacceleratedbythedevelopmentofvirtualizationandserversbeingrentedtocustomersinsharedfacilities.Astheneedofwebhostingstillisimportantbuthasbeentakenoverbyclouddatacenters,thetermsinternetdatacenter,andclouddatacenterarenolongerusedtodistinguishit.Insteadweusetoday just the term data center. There may be still an important differencebetween research data centers offered in academia and industry that focus onprovidingcomputationallypotentclustersfocusonnumericalcomputation.Suchdata centers are typically centered around the governance around a smallernumberofusersthatareeitherpartofanorganizationoravirtualorganization.However,weseethatevenintheresearchcommunitydatacentersnotonlyhost

supercomputers,butalsoWebserverinfrastructureandthesedaysevenprivatecloudsthatsupporttheorganizationalusers.Incaseofthelatterwespeakaboutsupportingthelongtailaboutscience.

Thelatterisdrivenbythe80%-20%rule.E.g.20%oftheusersuse80%ofthecompute power. This means that the top 20% of scientists are served by theleadershipclasssupercomputersinthenation,whiletherestareeitherservedbyotherservers,cloudofferingsthroughresearchandpublicclouds.

4.1.3DataCenterInfrastructure

Due to the data and the server needs in the cloud and in research such datacenters may look very different. Some focus on large scale computationalresources,someoncommodityhardwareofferedtothecommunity.Thesizeofthemisalsoverydifferent.Whileasupercomputingcenteraspartofauniversitywasoneofthelargestsuchdatacenterstwodecadesago,theydwarfthecentersnowdeployedbyindustrytoservethelongtailofcustomers.

Ingeneraladatacenterwillhavethefollowingcomponents:

Facility: the entire data centerwill be hosted in a building. The buildingmayhavespecificrequirementsrelatedtosecurity,environmentalconcerns,or even the integration into the local community with for exampleprovidingheattosurroundingresidences.

Support infrastructure: This buildingwill include a significant number ofsupportinfrastructurethataddressesforexamplecontinuouspowersupply,airconditioning,andsecurityForthisreasonyoufindinsuchcenters

UninterruptiblePowerSources(UPS)EnvironmentalControlUnitsPhysicalSecuritySystems

InformationTechnologyEquipment:Naturallythefacilitywillhost theITequipmentincludingthefollowing:

ServersNetworkServices

DisksDataBackupServices

Operationsstaff:Thefacilitywillneedtobestaffedwiththevariousgroupsthatsupportsuchdatacenters.Itincludes

ITStaffSecurityandFacilityStaffSupportInfrastructureStaff

Withregards to thenumberofpeopleservingsucha facility it isobviousthat throughautomation isquite low.According to [6] proper data centerstaffingisakeytoareliableoperation(seeFigure8).

AccordingtoFigure8operationalsustainabilitycontainsthreeelementsofoperational sustainability, namely management and operations, buildingcharacteristics,andsitelocation[6].

Figure8:DatacenterStaffImpact[6]

Another interesting observation is the root cause of incidents in a data center.Everyonehasprobablyexperienced someoutage, so it is important to identifywheretheycomefrominordertopreventthem.AsweseeinFigure9noteveryerror is caused by an operational issue. External, installation, design, andmanufacturer issues are together the largest issue for datacenter incidents (seeFigure9).FigureOutage.AccordingtotheUptimeInstituteAbnormalIncidentReports(AIRs)database,therootcauseof39%ofdatacenterincidentsfallsintotheoperationalarea[6].

Figure9:Datacenteroutage[6]

4.1.4DataCenterCharacteristics

Next we identify a number of characteristics when looking at different datacenters.

VariationinSize:Datacentersrangeinsizefromsmalledge facilities tomegascaleorhyperscalefillinglargewarehouses.

Variationincostperserver:Althoughmanydatacentersstandardizetheircomponents,specializedservicesmaybeofferednotona1Kserver,butona50Kserver.

VariationinInfrastructure:Serversincentersserveavariationofneedsand motivate different infrastructure: Use cases, Web Server, E-mail,MachineLearning,PleasantlyParallelproblem,traditionalsupercomputingjobs.

EnergyCost:Datacentersusealotofenergy.Theenergycostvariesperregion.Amotivationtoreduceenergyuseandcostisalsobeentrendedbyenvironmentalawareness,notonlybytheoperators,butbythecommunityinwhichsuchcentersoperate.

Reliability: Although through operational efforts the data center can bemademorereliable,failurestillcanhappen.Examplesare

https://www.zdnet.com/article/microsoft-south-central-u-s-datacenter-outage-takes-down-a-number-of-cloud-services/https://www.datacenterknowledge.com/archives/2011/08/07/lightning-in-dublin-knocks-amazon-microsoft-data-centers-offline

https://techcrunch.com/2012/10/29/hurricane-sandy-attacks-the-web-gawker-buzzfeed-and-huffington-post-are-down/

HenceDataCenterIaaSadvantagesinclude

ReducedoperationalcostIncreasedreliabilityIncreasedscalabilityIncreasedflexibilityIncreasedsupportRapiddeploymentDecrease management: Outsourcing expertise that is not related to corebusiness

Datacenterdisadvantagesinclude

LossofcontroloftheHWLossofcontrolofthedataModelispreferringmanyusersSoftwaretocontrolinfrastructureisnotaccessibleVariationsinperformanceduetosharingIntegrationrequireseffortbeyondloginFailurescanhaveahumongousimpact

4.1.5DataCenterMetrics

One of the most important factor to ensure smooth operation and offering ofservices is toemploymetrics thatwillbeable toprovidesignificant impactingtheoperations.Havingmetricsallowsthestafftomonitorandadapttodynamicsituationsbutalsotoplanoperations.

https://www.zdnet.com/article/microsoft-south-central-u-s-datacenter-outage-takes-down-a-number-of-cloud-services/https://www.datacenterknowledge.com/archives/2011/08/07/lightning-in-dublin-knocks-amazon-microsoft-data-centers-offlinehttps://techcrunch.com/2012/10/29/hurricane-sandy-attacks-the-web-gawker-buzzfeed-and-huffington-post-are-down/

4.1.5.1DataCenterEnergyCosts

Oneoftheeasiesttomonitormetricsforadatacenteristhecostofenergyusedtooperateallof theequipment.Energy isoneof the largestcostsadatacenterincurs during its operation as all of the servers, networking, and coolingequipment require power 24/7. For electricity, billing is usually measured intermsofkilowatthours(kWh)andkilowatts(kW).Dependingoncircumstances,theremayalsobecostsforpublicpurposeprograms,costrecovery,andstrandedcosts,buttheyarebeyondthescopeofthisbook.

Toprovideaquickunderstanding, it isbest tounderstand therelationbetweenkilowatthoursandkilowatts.kWhistypicallyreferredtoasconsumptionwhilekW is referred to as demand and it’s important to understand how these twoconceptsrelatetoeachother.Theeasiestanalogytodescribetherelationshipisto thinkofkilowatts(demand)as thesizeofawaterpipewhilekilowatt-hours(consumption) is how much water has passed through the pipe. If a serverrequires1.2kWtooperatethen,afteranhourhaspassed,itwillhaveconsumed1.2kWh.However,iftheserveroperatesat1.2kWfor30minutesandthengoesidleanddropsto0.3kWforanother30minutes,thentotalpowerconsumedwillbe:

kWh = 0.3 * 30/60 + 1.2 * 30/60 = 0.75 (2)

Energy costs for a datacenter, then, are composed of two things: charges forenergyandchargesfordemand.Energyistheamountoftotalenergyconsumedby the datacenter and will be the total kWhmultiplied by the cost per kWh.Demand is somewhat more complicated: it is the highest total consumptionmeasured in a 15minute period.Taking the previous example, if a datacenterhas1,000servers,thetotalenergyconsumptionwouldbe750kWhinthehour,butthedemandchargewouldbebasedoffof1,200kW(or1.2MW).

Thecosts,then,arehowtheutilitycompanyrecoupsitsexpenses:thechargeperkWh is it recouping thegeneration costwhile thekWcharge is recouping thecost of transmission and distribution (T&D). Typically, the demand charge ismuchhigherandwilldependonutilityconstraints-ifautilityischallengedontheT&Dfront, expect thesecosts tobeover$6-$10/kW. If theassumedcost-per-kWhis$0.12andcost-per-kWis$8,thecosttorunourserversforamonthwouldbe:

kWh = 0.75 * 24 * 30 * 0.12 * 1000 = 64, 800 (3)

kW = 1.2 * 8 * 1000 = 9, 600 (4)

Thiswould total to$74,400. It’s important tonote that fixingdemandchargescanhaveatremendouspayback:hadtheserverssimplyconsumed750kWoverthecourseofthehour,thenourdemandchargeswould’vebeenhalvedto$4,800whiletheenergycostsremainedthesame.Thisisalsowhyservervirtualizationcanhaveapositiveimpactonenergycosts:byhavingfewerserversrunningatahigherutilization,thedemandchargewilltendtolevelitselfoutas,onaverage,eachserverwillbemore fullyutilized.Forexample, it’sbetter topay for500serversat100%utilizationthan1000serversat50%utilizationeventhoughtheamount of work done is the same since, if the 1,000 servers momentarily alloperate at 100% utilization for even a brief amount of time in a month, thedemandchargeforthedatacenterwillbemuchhigher.

4.1.5.2DataCenterCarbonFootprint

Scientistsworldwidehaveidentifiedalinkbetweencarbonemissionandglobalwarming.Astheenergyconsumptionofadatacenterissubstantial,itisprudenttoestimatetheoverallcarbonemission.SchneiderElectric(formerlyAPC)hasprovidedareportonhowtoestimate theCarbonfootprintofadatacenter[7].Althoughthisreportisalreadyabitolder,itprovidesstillvaluableinformation.Itdefineskeytermssuchas

Carbondioxideemissionscoefficient(carbonfootprint):

With the increasing demand of data, bandwidth and high performancesystems, there is substantial amount of power consumption.This leads tohighamountofgreenhousesgases emission into theatmosphere, releasedduetoanykindofbasicactivitieslikedrivingavehicleorrunningapowerplant.

“Themeasurementincludespowergenerationplustransmissionanddistributionlossesincurredduringdeliveryoftheelectricitytoitspointofuse.”

Data centers in total used 91 billion kilowatt-hours (kWh) of electrical

energyin2013,andtheywilluse139billionkWhby2020. Currently,datacentersconsumeupto3percentofallglobalelectricity

productionwhileproducing200millionmetrictonsofcarbondioxide.Since world is moving towards cloud, causing more and more datacentercapacityleadingmoretopowerconsumption.

Peakerplant:

Peakingpowerplants, alsoknownas peaker plants, andoccasionally justpeakers, are power plants that generally run only when there is a highdemand,knownaspeakdemand,forelectricity.Becausetheysupplypoweronlyoccasionally, thepowersuppliedcommandsamuchhigherpriceperkilowatthourthanbaseloadpower.Peakloadpowerplantsaredispatchedincombinationwithbaseloadpowerplants,whichsupplyadependableandconsistent amount of electricity, to meet the minimum demand. Theseplants are generally coal-fired which causes a huge amount of CO2emissions.Apeakerplantmayoperatemanyhoursaday,oritmayoperateonly a few hours per year, depending on the condition of the region’selectricalgrid.Becauseofthecostofbuildinganefficientpowerplant,ifapeakerplant isonlygoing toberunforashortorhighlyvariable time, itdoesnotmakeeconomicsensetomakeitasefficientasabaseloadpowerplant.Inaddition,theequipmentandfuelsusedinbaseloadplantsareoftenunsuitableforuseinpeakerplantsbecausethefluctuatingconditionswouldseverely strain the equipment. For these reasons, nuclear, geothermal,waste-to-energy, coal and biomass are rarely, if ever, operated as peakerplants.

Avoidedemissions:

Emissions avoidance is the most effective carbon management strategyover a multi-decadal timescale to achieve atmospheric CO2 stabilizationand a subsequent decline. This prevents, in the first place, stableunderground carbon deposits from entering either the atmosphere or lessstablecarbonpoolsonlandandintheoceans.

Carbonoffsetsbasedonenergyefficiencyrelyontechnicalefficienciestoreduce energy consumption and therefore reduce CO2 emissions. Suchimprovements are often achieved by introducing more energy efficient

lightening,cooking,heatingandcooling systems.Theseare real emissionreductionstrategiesandhavecreatedvalidoffsetprojects.

This typeof carbonoffsetprovidesperhaps the simplestoptions thatwillease the adoption of low carbon practice. When these practices becomegenerallyaccepted (or compulsory), theywillno longerqualifyasoffsetsandfurtherefficiencieswillneedtobepromoted.

CO2(carbondioxide,orcarbon):

Carbondioxideisthemaincauseofthegreenhouseeffect,it isemittedinhuge amount into our atmosphere with a life cycle of almost 100 years.Datacentersemitduring themanufacturingprocessofall thecomponentsthatpopulateadatacenter(servers,UPS,buildingshell,cooling,etc.)andduring operation of data centers (in terms of electricity consumed), themaintenance of the data centers (i.e. replacement of consumables likebatteries, capacitors, etc.), and thedisposalof thecomponentsof thedatacenters at the end of the lifecycle. Until now, power plants have beenallowedtodumpunlimitedamountsofcarbonpollutionintotheatmosphere-noruleswereineffectthatlimitedtheiremissionsofcarbondioxide,theprimary driver of global warming. Now, for the first time, the EPA hasfinalized new rules, or standards, thatwill reduce carbon emissions frompower plants. Known as the Clean Power Plan, these historic standardsrepresentthemostsignificantopportunityinyearstohelpcurbthegrowingconsequencesofclimatechange.

Thedatacenterwillhaveatotalcarbonprofile,thatincludesthemanydifferentaspects of a data center contributing to carbon emissions. This includesmanufacturing,packaging, transportation, storage,operationof thedatacenter,and decommissioning.Thus it is important to notice thatwe not only need toconsidertheoperationbutalsotheconstructionanddecommissionphases.

4.1.5.3DataCenterOperationalImpact

Oneof themainoperational impacts is thecostandemissionsofadatacentercause by running, and cooling the servers in the data center.Naturally this isdependent on the type of fuel that is used to produce the energy. The actual

carbonimpactusingelectricitycertainlydependsonthetypeofpowerplantthatis used to provide it.These energy costs and distribution ofwhere the energycomes fromcanoftenbe lookedupbygeographical regionson the internetorform the local energyprovider.Municipal government organizationsmay alsohavesuchinformation.ToolssuchastheIndianaStateProfileandEnergyUse[8].

may provide valuable information to derive such estimates.Correlating a datacenterwithcheapenergyisakeyfactor.ToestimatebothcostsintermsofpriceandcarbonemissionSchneiderprovidesaconvenientCarbonestimatecalculatorbasedonenergyconsumption.

https://www.schneider-electric.com/en/work/solutions/system/s1/data-center-and-network-systems/trade-off-tools/data-center-carbon-footprint-comparison-calculator/tool.htmlhttp://it-resource.schneider-electric.com/digital-tools/calculator-data-center-carbon

Ifwecalculatethetotalcost,weneednaturallyaddallcostsarisingfrombuildandteardownphaseaswellasoperationalupgrades.

4.1.5.4PowerUsageEffectiveness

OneofthefrequentmeasurementsindatacentersthatisusedisthePowerusageeffectivenessorPUEinshort.Itisameasurementtoidentifyhowmuchenergyis ued for the computing equipment versus other energy costs such as airconditioning.

Formallywedefineitas

PUEistheratiooftotalamountofenergyusedbyacomputerdatacenterfacilitytotheenergydeliveredtocomputingequipment.

PUEwaspublishedin2016asaglobalstandardunderISO/IEC30134-2:2016.

TheinverseofPUEisthedatacenterinfrastructureefficiency(DCIE).

ThebestvalueofPUEis1.0.Anydatacentermustbehigherthanthisvalueas

https://www.eia.gov/state/?sid=INhttps://www.schneider-electric.com/en/work/solutions/system/s1/data-center-and-network-systems/trade-off-tools/data-center-carbon-footprint-comparison-calculator/tool.htmlhttp://it-resource.schneider-electric.com/digital-tools/calculator-data-center-carbonhttps://www.iso.org/standard/63451.html

officesandothercostsurelywillarisewhenwelookattheformula

PUE =

PUE = 1 +

AccordingtothePUEcalculatorat

https://www.42u.com/measurement/pue-dcie.htm

Thefollowingratingsaregiven

PUE DCIS LevelofEfficiency3.0 33% VeryInefficient2.5 40% Inefficient2.0 50% Average

1.5 67% Efficient1.2 83% VeryEfficient

PUEisaverypopularmetricasitisrelativelyeasytocalculateandprovidesametricthatcaneasilycomparedatacentersbetweeneachother.

Thismetriccomesalsowithsomedrawbacks:

Itdoesnotintegrateforexampleclimatebaseddifferences,suchasthattheenergyuse tocooladatacenter incolderclimates is less than inwarmerclimates. However, this may actually be a good side-effect as this willlikelyresultinlesscoolingneedssandthereforenergycosts.It also forces large data centers with many shared servers in contrast tosmalldatacenterswhereoperationalcostmaybecomerelevant.Itdoesnottakeinconsiderationrecycledenergytoforexampleheatotherbuildingsoutsideofthedatacenter.

HenceitisprudentnottojustlookatthePUEbutalsoatothermetricsthatleadto the overall cost and energy usage of the total ecosystem the data center islocatedin.

Total Facility Energy

IT Equipment Energy

Non IT Facility Energy

IT Equipment Energy

https://www.42u.com/measurement/pue-dcie.htm

Already in 2006, Google reported its six data centers efficiency as 1.21 andMicrosoft as 1.22which at that timewere considered very efficient.Howeverover timethesetargethasshiftedandtoday’sdatacentersachievemuchlowervalues. The Green IT Cube in Darmstadt, Germany even reported 1.082.AccordingtoWikipediaanunnamedFortune500companyachievedwith30000SuperMicrobladesaPUEof1.06in2017.

Exercises

E.PUE.1:LowestPUEyoucanfind

What is the lowest PUE you can find. Provide details about thesystemaswellasthedatewhenthePUEwasreported.

4.1.5.5Hot-ColdAisle

To understand hot-cold aisles, one must take a brief foray into the realm ofphysicsandenergy.Specifically,understandinghowatemperaturegradienttriestoequalize.ThemostimportantformulatoknowistheheattransferEquation5.

q = hcA(ta − ts) (5)

Here,q is the amountof heat transferred for agiven amountof time.For thisexample,wewillcalculateitasW/hourasthatis,conveniently,howenergyisbilled.Airmovingatamoderate speedwill transferapproximately8.47WattsperSquareFootperHour.A1Userver is19 incheswideandabout34 inchesdeep.Multiplyingthetwovaluesgivesusacrosssectionof646squareinches,or4.48squarefeet.PluggingthesevaluesintoourEquation5us:

q = 8.47 * 4.48 *(ta − ts)) (6)

This begins to point us towards why hot-cold aisles are important. If weintroducecold air from theACsystem into the sameaisle that the servers areexhausting into, theairwillmixandbegin toaverageout.Forexample, ifourserversareproducingexhaustat100FandourACunitprovides65Fatthesamerate,thentheaverageairtemperaturewillbecome82.5F(assumingbalancedairpressure).Thishasadeleteriouseffectonourservercooling-warmerairtakesheatawayfromwarmersurfacesslowerthancoolerair:

1, 328.2 = 8.47 * 4.48 *(100 − 65)

664.0 = 8.47 * 4.48 *(100 − 82.5))

Fromthepreviouslisting,wecanseethata35degreedeltaallowsthecentertodissipate1,300Wattsofwasteheatfroma1Userverwhilea17.5degreedeltaallowsustoonlydissipate664Wattsofenergy.Ifaserverisconsumingmorethan 664 Watts, it’ll continue to get warmer and warmer until it eventuallyreaches a temperature differential high enough to create an equilibrium (orreachesathermalthrottleandbeginstoreduceperformance).

Tocombatthis,engineersdevelopedtheideaofdesignatingalternatingaislesaseitherhotorcold.AllserversinagivenaislearethenorientedsuchthattheACsystemprovides cool air into the cold aislewhere it is drawn inby the serverwhichthenexhaustsitintothehotaislewheretheventilationsystemremovesitfrom the room. This has the benefit of maximizing the temperature deltabetweentheprovidedairandtheserver’sprocessor(s), reducingtheamountofquantityofair thatmustbeprovidedinordertocooltheserverandimprovingoverallsystemefficiency.

SeeFigure10tounderstandhowthehot-coldisleconfigurationissetupinadatacenter.

Figure10:HotColdIsle[9]

4.1.5.5.1Containment

While modern data centers employ highly sophisticated mechanisms to be asenergy efficient as possible. One such mechanism which can be seen as aimprovement on top of the Hot-Cold isle arrange is to use either hot islecontainmentorcold islecontainment.Usingacontainmentsystemcanremovetheissuewithfreeflowingair.

As the name somewhat implies in cold air containment, the data centers isdesigned so that only cold air goes into the cold isle, thismakes sure that thesystem only draws in cold air for cooling purposes. Conversely in hot islecontainmentdesign,thehotisleiscontainedsothatthehotaircollectedinthehotisleisdrawnoutbythecoolingsystemandsothatthecoldairdoesnotflowintothehotisles[10].

4.1.5.5.1.1WaterCooledDoors

Anothergoodwayofreducingtheenergyconsumptionistoinstallwatercooleddoorsdirectlyathe rackas shown inFigure11.Cooling even can be activelycontrolled so that in case of idle servers less energy is spend to conduct thecooling.Therearemanyvendorsthatprovidesuchcoolingsolutions.

Figure11:ActiveRearDoorlink

4.1.5.6WorkloadMonitoring

4.1.5.6.1WorkloadofHPCintheCloud

Clouds and especially university data centers do not just provide virtualmachinesbutprovidetraditionalsupercomputerservices.ThisincludestheNSFsponsoredXSEDE project. As part of this project the "XDMoD auditing toolprovides,forthefirsttime,acomprehensivetooltomeasurebothutilizationandperformanceofhigh-endcyberinfrastructure(CI),withinitialfocusonXSEDE.Several case studies have shown its utility for providing important metricsregardingresourceutilizationandperformanceofTeraGrid/XSEDEthatcanbeused for detailed analysis and planning as well as improving operationalefficiency and performance. Measuring the utilization of high-endcyberinfrastructure such as XSEDE helps provide a detailed understanding ofhowagivenCIresourceisbeingutilizedandcanleadtoimprovedperformanceof the resource in terms of job throughput or any number of desired jobcharacteristics.

Detailed historical analysis of XSEDE usage data using XDMoD clearlydemonstratesthetremendousgrowthinthenumberofusers,overallusage,andscale"[11].

Having access to a detailed metrics analysis allows users and centeradministrators, as well as project managers to better evaluate the use andutilizationofsuchlargefacilitiesjustifyingtheirexistence(seeFigure12)

https://www.mainlinecomputer.com/t/product-lines/cabinets-and-racks/rear-door-heat-exchangers/chilled-doorr-high-density-rack-cooling-system/

Figure12:XDMod:XSEDEMetricsonDemand

Additionalinformationisavailableat

https://open.xdmod.org/7.5/index.html

4.1.5.6.2ScientificImpactMetric

Gregor von Laszewski and Fugang Wang are providing a scientific impactmetrictoXDMoDandXSEDE.Itisaframeworkthat(a)integratespublicationand citation data retrieval, (b) allows scientific impact metrics generation atdifferent aggregation levels, and (c) provides correlation analysis of impactmetrics based on publication and citation data with resource allocation for acomputingfacility.ThisframeworkisusedtoconductascientificimpactmetricsevaluationofXSEDE,andtocarryoutextensivestatisticalanalysiscorrelatingXSEDEallocationsizetotheimpactmetricsaggregatedbyprojectandFieldofScience. This analysis not only helps to provide an indication of XSEDE’Sscientific impact,butalsoprovides insight regardingmaximizing the returnon

https://open.xdmod.org/7.5/index.html

investment in terms of allocation by taking into account Field of Science orprojectbasedimpactmetrics.ThefindingsfromthisanalysiscanbeutilizedbytheXSEDE resource allocation committee to help assess and identify projectswith higher scientific impact. Through the general applicability of the novelmetrics we invented, it can also help providemetrics regarding the return oninvestmentforXSEDEresources,orcampusbasedHPCcenters[12].

4.1.5.6.3CloudsandVirtualMachineMonitoring

Although no longer in operation in its original form FutureGrid [13] haspioneered the extensivemonitoring and publication of its virtualmachine andprojectusage.Wearenotawareofacurrentsystemthatprovides this levelofdetailassofyet.However,effortsaspartofXSEDEwithintheXDMoDprojectareunderwayatthistimebutarenotintegrated.

Futuregridprovidedaccess toallvirtualmachineinformation,aswellasusageacross projects. An archived portal view is available at FutureGrid CloudMetrics[13].

http://archive.futuregrid.org/metrics/html/results/2014-Q3/reports/rst/india-All.htmlhttp://archive.futuregrid.org/metrics/html/results/2014-Q3/reports/rst/india-All.html

Figure13:FutureGridCloudMetric

Futuregrid offered multiple clouds including clouds based on OpenStack,Eucalyptus,andNimbus.NimbusandEucalyptusaresystemsthatarenolongerusedinthecommunity.OnlyOpenStackistheonlyviablesolutioninadditiontothecloudofferingsbyCometthatdonotusesOpenStack(seeFigure13).

Futuregrid, could monitor all of them and published its result in its Metricsportal.Monitoring theVMs is an important activity as they can identifyVMsthatmay no longer be used (the user has forgotten to terminate them) or toomuchusageofauserorprojectcanbedetectedinearlystages.

Weliketoemphasizeseveralexampleswheresuchmonitoringishelpful:

Assumeastudentparticipatesinaclass,metricsandlogsallowtoidentifystudents that do not use the system as asked for by the instructors. ForexampleitiseasytoidentifyiftheyloggedonandusedVMs.Furthermore

thelengthofrunningaVMbaLet us assume a user with willful ignorance does not shut down VMsalthough they are not used because research clouds are offered to us forfree. In fact, this situation happened to us recently while using anothercloud and such monitoring capacities were not available to us (onjetstream).Theusersingle-handedlyuseduptheentireallocationthatwassupposedtobesharedwith30otherusersinthesameproject.Allaccountsof all userswere quasi deactivated as the entire project they belonged toweredeactivated.Duetoallocationreviewprocessesittookabout3weekstoreactivatefullaccess.sedonthetaskstobecompletedcanbecomparedagainstotherstudentmembers.Incommercialcloudsyouwillbechargedmoney.Therefore,itislesslikelythatyouforgettoshutdownyourmachineIn case you useGitHub carelessly andpost your cloudpasswords or anyother passwords in it, you will find that within five minutes your cloudaccount will be compromised. There are individuals on the network thatcleverlymineGitHubforsuchsecuritylapsesandwilluseyourpasswordifyouindeedhavestoredtheminit.InfactGitHub’sdeletionofafiledoesnotdeletethehistory,soasanonexpertdeletingthepasswordformGitHubisnotsufficient.Youwillhavetoeitherdeleteandrewritethehistory,butdefinitelyin thiscaseyouwillneedtoreset thepassword.Monitoringthepubliccloudusage in thedatacenter is importantnotonly inyour regionbut other regions as the password is valid also there and intruders couldhijackandstartservicesinregionsthatyouhaveneverused.

In addition to FutureGrid, we like to point out Comet (see other sections). Itcontains an exception for VM monitoring as it uses a regular batch queuingsystemtomanagethejobs.MonitoringofthejobsisconductedthroughexistingHPCtools.

4.1.5.6.4WorkloadofContainers

Monitoringtoolsforcontainerssuchasforkubernetesarelistedat:

https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/

Such tools can be deployed alongside kubernetes in the data center, but will

https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/

likely have restrictions to its access. They are for those who operate suchservices for example in kubernetes.Wewill discuss this in future sections inmoredetail.

4.1.6ExampleDataCenters

In this sectionwewill be giving some data center exampleswhile looking atsomeofthemayorcloudproviders.

4.1.6.1AWS

AWSfocusesonsecurityaspectsoftheirdatacentersthat includefouraspects[14]:

PerimeterLayerInfrastructureLayerDataLayerEnvironmentalLayer

Theglobalinfrastructure[15]asofJanuary2019includes60AvailabilityZoneswithin 20 geographic Regions. Plans exists to add 12 Availability Zones andfour additional Regions in Bahrain, Hong Kong SAR, Sweden, and a secondAWSGovCloudRegionintheUS(seeFigure14).

https://aws.amazon.com/compliance/data-center/perimeter-layer/https://aws.amazon.com/compliance/data-center/infrastructure-layer/https://aws.amazon.com/compliance/data-center/data-layer/https://aws.amazon.com/compliance/data-center/environmental-layer/https://aws.amazon.com/about-aws/global-infrastructure/

Figure14:AWSregions[15]

Amazonstrivestoachievehighavailabilitythroughmultipleavailabilityzones,improvedcontinuitywithreplicationbetweenregions,meetingcomplianceanddata residency requirements as well as providing geographic expansion. SeeFigure15

Theregionsandnumberofavailabilityzonesareasfollows:

Region US East: N. Virginia (6), Ohio (3) US West N. California (3),Oregon(3)Region: Asia Pacific Mumbai (2), Seoul (2), Singapore (3), Sydney (3),Tokyo(4),Osaka-Local(1)1CanadaCentral(2)ChinaBeijing(2),Ningxia(3)Region: Europe Frankfurt (3), Ireland (3), London (3), Paris (3) SouthAmericaSãoPaulo(3)RegionGovCloud:AWSGovCloud(US-West)(3)New Region (coming soon): Bahrain, Hong Kong SAR, China, Sweden,AWSGovCloud(US-East)

4.1.6.2Azure

Azure claims to havemoreglobal regions[16] than any other cloud provider.

https://azure.microsoft.com/en-us/global-infrastructure/regions/

Theymotivatethisbytheiradvertisementtobringandapplicationstotheusersaroundtheworld.Thegoalissimilarasothercommercialhyper-scaleprovidersby introducing preserving data residency, and offering comprehensivecompliance and resilience. As of Aug 29, 2018 Azure supports 54 regionsworldwide.These regions can currently be accessed by users in 140 countries(seeFigure15).Not every service is offered in every region as the service toregionmatrixshows:

https://azure.microsoft.com/en-us/global-infrastructure/services/

Figure15:Azureregions[16]

4.1.6.3Google

FromGoogle [17] we find that on Aug. 29th Google has the following datacenterlocations(seeFigure16):

NorthAmerica:BerkeleyCounty,SouthCarolina;CouncilBluffs, Iowa;Douglas County, Georgia; Jackson County, Alabama; Lenoir, NorthCarolina;MayesCounty,Oklahoma;MontgomeryCounty,Tennessee;TheDalles,OregonSouthAmerica:Quilicura,ChileAsia:ChanghuaCounty,Taiwan;SingaporeEurope: Dublin, Ireland; Eemshaven, Netherlands; Hamina, Finland; StGhislain,Belgium

https://azure.microsoft.com/en-us/global-infrastructure/services/https://www.google.com/about/datacenters/inside/locations/index.html

Figure16:Googledatacenters[17]

Each data center is advertised with a special environmental impact such as auniquecoolingsystem,orwildlifeonpremise.Google’sdatacenterssupportitsservice infrastructure and allow hosting as well as other cloud services to beofferedtoit’scustomers.

Googlehighlightsitsefficiencystrategyandmethodshere:

https://www.google.com/about/datacenters/efficiency/

Theysummarizetheiroffersarebasedon

MeasuringthePUEManagingairflowAdjustingthetemperatureUsefreeCoolingOptimizingthepowerdistribution

https://www.google.com/about/datacenters/efficiency/

Figure17:PUEdataforalllarge-scaleGoogledatacenters

ThePUE[18]dataforalllarge-scaleGoogledatacentersisshowninFigure17

An important lesson fromGoogle is the PUE boundary. That is the differentefficiencybasedontheclosenessoftheITinfrastructuretotheactualdatacenterbuilding.ThisindicatesthatitisimportanttotakeatanyprovidersdefinitionofPUE in order not to report numbers that are not comparable between othervendorsandareallencompassing.

Figure18:GoogledatacenterPUEmeasurementboundaries[18]

Figure 18 shows the Google data center PUE measurement boundaries. TheaveragePUE[18]forallGoogledatacentersis1.12,althoughwecouldboastaPUEaslowas1.06whenusingnarrowerboundaries.

https://www.google.com/about/datacenters/efficiency/internal/https://www.google.com/about/datacenters/efficiency/internal/

Asaconsequence,GoogleisdefiningitsPUEindetailinEquation7.

PUE = (7)

wheretheabbreviationsstandfor

ESIS=Energyconsumptionforsupportinginfrastructurepowersubstationsfeeding the cooling plant, lighting, office space, and some networkequipmentEITS = Energy consumption for IT power substations feeding servers,network,storage,andcomputerroomairconditioners(CRACs)ETX=MediumandhighvoltagetransformerlossesEHV=HighvoltagecablelossesELV=LowvoltagecablelossesEF=Energyconsumptionfromon-site fuels includingnaturalgas&fueloilsECRAC=CRACenergyconsumptionEUPS=Energylossatuninterruptiblepowersupplies(UPSes)whichfeedservers,network,andstorageequipmentENet1=Networkroomenergyfedfromtype1unitsubstitution

Formoredetailssee[18].

4.1.6.4IBM

IBMmaintains almost 60data centers,which are placedglobally in 6 regionsand18availabilityzones.IBMtargetsbusinesseswhileofferinglocalaccesstoitscenterstoallowforlowlatency.IBMstatesthattroughthislocalizationuserscan decide where and how data and workloads and address availability, faulttoleranceandscalability.AsIBMisbusinessorienteditalsostressesitscertifiedsecurity.

Moreinformationcanbeobtainedfrom:

https://www.ibm.com/cloud/data-centers/

AspecialserviceofferingisprovidedbyWatson.

ESIS + EIT S + ETX + EHV + ELV + EFEIT S − ECRAC − EUPS −ELV + ENet1

https://www.google.com/about/datacenters/efficiency/internal/https://www.ibm.com/cloud/data-centers/

https://www.ibm.com/watson/

which is focusing on AI based services. It includes PaaS services for deeplearning, but also services that are offered to the healthcare and othercommunitiesasSaaS

4.1.6.5XSEDE

XSEDE is anNSFsponsored largedistributed setof clusters, supercomputers,dataservices,andclouds,buildinga“singlevirtualsystemthatscientistscanuseto interactivelysharecomputing resources,dataandexpertise”.TheWebpageofXSEDEislocatedat

https://www.xsede.org/

Primarycomputeresourcesarelistedintheresourcemonitorat

https://portal.xsede.org/resource-monitor

ForcloudComputingthefollowingsystemsareofespecialimportancealthoughselected othersmay also host container based systemswhile using singularity(seeFigure19):

CometvirtualclustersJetstreamOpenStack

https://www.ibm.com/watson/https://www.xsede.org/https://portal.xsede.org/resource-monitor

Figure19:XSEDEdistributedresourceinfrastructure

4.1.6.5.1Comet

ThecometmachineisalargerclusterandoffersbaremetalprovisioningbasedonKVMandSLURM.Thusitisauniquesystemthatcanrunatthesametimetraditional super computing jobs such asMPIbasedprograms, aswell as jobsthat utilize virtualmachines.With its availability of >46000 cores it providesone of the larges NSF sponsored cloud environment. Through its ability toprovidebaremetalprovisioningandtheaccesstoInfinibandbetweenallvirtualmachinesitisanidealmachineforexploringperformanceorientedvirtualizationtechniques.

Comethasabout3timesmorecoresthanJetstream.

4.1.6.5.2Jetstream

Jetstream is a machine that specializes in offering a user friendly cloudenvironment. It utilizes an environment called atmosphere that is targetinginexperienced scientific cloud users. It also offers an OpenStack environmentthat isusedbyatmosphereand is forclassessuchasours thepreferredaccess

method.Moreinformationaboutthesystemcanbefoundat

https://dcops.iu.edu/

4.1.6.6ChameleonCloud

Chameleon cloud is a configurable experimental environment for large-scalecloud research. It is offering OpenStack as a service including some moreadvancedservicesthatallowexperimentationwiththeinfrastructure.

https://www.chameleoncloud.org/

Anoverviewofthehardwarecanbeobtainedfrom

https://www.chameleoncloud.org/hardware/

4.1.6.7IndianaUniversity

IndianaUniversityhasadatacenterinwhichmanydifferentsystemsarehoused.This includes not only jetstream, but also many other systems. The systemsincludeproduction,business,andresearchclustersandservers.SeeFigure20

Figure20:IUDataCenter

OntheresearchclustersideitoffersKarstandCarbonate:

https://kb.iu.edu/d/bezu(Karst)https://kb.iu.edu/d/aolp(Carbonate)

OneofthespecialsystemslocatedinthedatacenterandmanagedbytheDigital

https://dcops.iu.edu/https://www.chameleoncloud.org/https://www.chameleoncloud.org/hardware/https://kb.iu.edu/d/bezuhttps://kb.iu.edu/d/aolp

ScienceCenteriscalledFuturesystems,whichprovidesagreatresourcefortheadvanced students of Indiana University focusing on data engineering.WhilesystemssuchasJetstreamandChameleoncloudspecializeinproductionreadycloud environments, Futuresystems, allows the researchers to experimentwithstate-of-the-art distributed systems environments supporting research. It isavailablewithComet and thus could also serve as an on-ramp to using largerscaleresourcesoncometwhileexperimentingwiththesetuponFuturesystems.

Suchanoffering is logicalasresearchers in thedataengineering trackwant tofurtherdevelopsystemssuchasHadoop,SPark,orcontainerbaseddistributedenvironmentsandnotusethetoolsthatarereleasedforproductionastheydonotallowimprovementstotheinfrastructure.FuturesystemsismanagedandofferedbybytheDigitalScienceCenter.

HenceIUoffersveryimportantbutneededservices

KarstfortraditionalsupercomputingJetstreamforproductionusewithfocusonvirtualmachinesFuturesystems for research experiment environments with access to baremetal.

4.1.6.8ShippingContainers

Afewyearsagodatacentersbuildfromshippingcontainerswereverypopular.ThisincludesseveralmainCloudproviders.Suchprovidershavefoundthattheyare not the best way to develop centers at scale. This includesMicrosoft andGoogleThecurrenttrendhoweveristobuildmegaorhyperscaledatacenters.

4.1.7ServerConsolidation

Oneof thedriving factors in cloud computing and the rise of large scale datacentersistheabilitytouseservervirtualizationtoplacemorethanoneserveronthe same hardware. Formerly the services were hosted on their own servers.Todaytheyaremanagedonthesaehardwarealthoughtheylooktothecustomerlikeseparateservers.

Asaresultwefindthefollowingadvantages:

https://www.datacenterknowledge.com/archives/2016/04/20/microsoft-moves-away-from-data-center-containershttps://blogs.technet.microsoft.com/msdatacenters/2013/04/22/microsofts-itpac-a-perfect-fit-for-off-the-grid-computing-capacity/

reduction of administrative and operations cost:While we reduce thenumber of servers and utilize hardware to host multiple on themmanagementcost,space,power,andmaintenancecostarereduced.

betterresourceutilization:Throughloadbalancingstrategiesserverscanbe better utilized while for example increase load so resource idling isavoided.

increased reliability: As virtualized servers can be snapshotted, andmirrored,thesefeaturescanbeutilizedinstrategiestoincreasereliabilityincaseoffailure.

standardization: As the servers are deployed in large scale, theinfrastructureisimplicitlystandardizedbasedonserver,network,anddisk,making maintenance and replacements easier. This also includes thesoftware that is running on such servers (OS, platform and may evenincludeapplications).

4.1.8DataCenterImprovementsandConsolidation

Duetotheimmensenumberofserversindatacenters,aswellastheincreasedworkloadonitsservers,theenergyconsumptionofdatacentersislargenotonlytoruntheservers,buttoprovidethenecessarycooling.Thusit is importanttorevisittheimpactsuchdatacentershaveontheenergyconsumption.Oneofthestudiesthatlookedintothisisfrom2016andispublishedbyLBNL[19]Inthisstudy the data center electricity consumption back to 2000 is analyzed whileusingpreviousstudiesandhistoricalshipmentdata.Aforecastiswithdifferentassumptioniscontraststill2020

FigureEnergyForecastdepicts“anestimateoftotalU.S.datacenterelectricityuse (servers, storage,networkequipment, and infrastructure) from2000-2020”(seeFigure21).

While in “2014 the data centers in theU.S. consumed an estimated70billionkWh” or “about 1.8% of total U.S. electricity consumption”. However, morerecent studies find an increase by about 4% from2010-2014.This contrasts alargederivationfromthe24%thatwereoriginallypredictedseveralyearsago.Thestudyfindsthatthepredictedenergyusewouldbeapproximately73billion

https://cloudfront.escholarship.org/dist/prd/content/qt84p772fc/qt84p772fc.pdf

kWhin2020.

Figure21:EnergyForecast[19]

It isclear that theoriginalpredictionof largeenergyconsumptionmotivatedatrendinindustrytoprovidemoreenergyefficientdatacenters.Howeverifsuchenergyefficiencyeffortswouldnotbeconductedorencouragedwewouldseeacompletelydifferentscenario.

Thescenariosareidentifiedthatwillsignificantlyimpacttheprediction:

improvedmanagementincreasesenergy-efficiencythroughoperationalortechnological changes with minimal investment. Strategies includeimprovingtheleastefficientcomponents.

best practices increases the energy-efficiency gains that can be obtainedthrough thewidespread adoption themost efficient technologies and bestmanagement practices applicable to each data center type. This scenariofocusesonmaximizingtheefficiencyofeachtypeofdatacenterfacility.

hyperscale data centers where the infrastructure will be moved fromsmallerdatacenterstolargerhyperscaledatacenters.

4.1.9ProjectNatick

To reduce energy consumption in data centers and reduce cost of coolingMicrosofthasdevelopedProjectNatick. To tackle this problemMicrosoft hasbuiltunderwaterdatacenter.Anotherbenefitofthisprojectisthatdatacentercanbedeployedinlargebodiesofwatertoservecustomersresidinginthatareasoithelps to reduce latency by reducing distance to users and therefore increasingdatatransferspeed.Therearetwophasesofthisproject.

Theprojectwasexecutedintwophases.

Phase 1 was executed between August to November 2015. In this phaseMicrosoftwas successfully able to deploy and operate vessel underwater.Thevessel was able to tackle cooling issues and effect of biofouling as well.Biofouling is referred to as the fouling of pipes and underwater surfaces byorganisms

dsc.soic.indiana.edudsc.soic.indiana.edu/publications/cloudcomputingtopics.pdf · cloud computing 1...

Documents