dsc.soic.indiana.edudsc.soic.indiana.edu/publications/cloudcomputingtopics.pdf · cloud computing 1...
TRANSCRIPT
-
CLOUDCOMPUTING
GregorvonLaszewski
(c)GregorvonLaszewski,2018,2019
-
CLOUDCOMPUTING
1PREFACE1.1LearningObjectives☁1.2ePubReaders☁1.3Corrections☁1.4Contributors☁1.5Notation☁1.5.1Figures1.5.2Hyperlinksinthedocument1.5.3Equations1.5.4Tables
1.6Updates☁2OVERVIEW☁3DEFINITIONOFCLOUDCOMPUTING☁3.1DefiningthetermCloudComputing3.2HistoryandTrends3.3JobasaCloud/DataEngineer3.4YoumustbethatTALLL
4DATACENTER4.1DataCenter☁4.1.1Motivation:Data4.1.1.1Howmuchdata?
4.1.2CloudDataCenters4.1.3DataCenterInfrastructure4.1.4DataCenterCharacteristics4.1.5DataCenterMetrics4.1.5.1DataCenterEnergyCosts4.1.5.2DataCenterCarbonFootprint4.1.5.3DataCenterOperationalImpact4.1.5.4PowerUsageEffectiveness4.1.5.5Hot-ColdAisle4.1.5.5.1Containment4.1.5.5.1.1WaterCooledDoors
4.1.5.6WorkloadMonitoring4.1.5.6.1WorkloadofHPCintheCloud
-
4.1.5.6.2ScientificImpactMetric4.1.5.6.3CloudsandVirtualMachineMonitoring4.1.5.6.4WorkloadofContainers
4.1.6ExampleDataCenters4.1.6.1AWS4.1.6.2Azure4.1.6.3Google4.1.6.4IBM4.1.6.5XSEDE4.1.6.5.1Comet4.1.6.5.2Jetstream
4.1.6.6ChameleonCloud4.1.6.7IndianaUniversity4.1.6.8ShippingContainers
4.1.7ServerConsolidation4.1.8DataCenterImprovementsandConsolidation4.1.9ProjectNatick4.1.10RenewableEnergyforDataCenters4.1.11SocietalShiftTowardsRenewables4.1.12DatacenterRisksandIssues4.1.13Exercises
5ARCHITECTURE5.1Architectures☁5.1.1EvolutionofComputeArchitectures5.1.1.1MainframeComputing5.1.1.2PCComputing5.1.1.3IntranetandServerComputing5.1.1.4GridComputingComputing5.1.1.5InternetComputing5.1.1.6CloudComputing5.1.1.7MobileComputing5.1.1.8InternetofThingsComputing5.1.1.9EdgeComputing5.1.1.10FogComputing
5.1.2AsaServiseArchitectureModel5.1.3ProductorFunctionalBasedModel5.1.4NISTCloudArchitecture
-
5.1.5CloudSecurityAllianceReferenceArchitecture5.1.6MulticloudArchitectures5.1.6.1CloudmeshArchitecture
5.1.7Resources5.2NISTBigDataRefereneceArchitecture☁5.2.1PathwaytotheNIST-BDRA5.2.2BigDataCharacteristicsandDefinitions5.2.3BigDataandtheCloud5.2.4BigData,EdgeComputingandtheCloud5.2.5ReferenceArchitecture5.2.6FrameworkProviders5.2.7ApplicationProviders5.2.8Fabric5.2.9Interfacedefinitions
5.3TheY-SchedulingArchitectureView☁6REST6.1IntroductiontoREST☁6.1.0.1CollectionofResources6.1.0.2SingleResource6.1.0.3RESTToolClassification
6.2OPENAPI3.06.2.1RESTSpecifications☁6.2.1.1OPENAPI6.2.1.1.1OpenAPI3.0Specification(OAS3.0)6.2.1.1.1.1Definitions
6.2.1.2RAML6.2.1.3APIBlueprint6.2.1.4JsonAPI6.2.1.5Tinyspec6.2.1.6Tools6.2.1.6.1Connexion
6.2.2OpenAPI3.0RESTServiceviaIntrospection☁6.2.2.1Verification6.2.2.2Swagger-UI6.2.2.3Mockservice6.2.2.4Exercise
6.2.3RESTAIservicesExample☁
-
6.2.3.1ServiceEndpoints/Paths6.2.3.1.1Pathkmeans/upload6.2.3.1.2Pathkmeans/fit6.2.3.1.3Pathkmeans/predict
6.2.3.2Files6.2.3.3Runningtheexample6.2.3.4Notes
6.3FlaskRESTfulServices☁6.4DjangoRESTFramework☁6.5GithubRESTServices☁6.5.1Issues6.5.2Exercise
6.6OpenAPIRESTServiceswithSwagger☁6.6.1SwaggerTools6.6.2SwaggerCommunityTools6.6.2.1ConvertingJsonExamplestoOpenAPIYAMLModels
6.7RESTWITHEVE6.7.1RestServiceswithEve☁6.7.1.1UbuntuinstallofMongoDB6.7.1.2macOSinstallofMongoDB6.7.1.3Windows10InstallationofMongoDB6.7.1.4DatabaseLocation6.7.1.5Verification6.7.1.6BuildingasimpleRESTService6.7.1.7InteractingwiththeRESTservice6.7.1.8CreatingRESTAPIEndpoints6.7.1.9RESTAPIOutputFormatsandRequestProcessing6.7.1.10RESTAPIUsingaClientApplication6.7.1.11Towardscmd5extensionstomanageeveandmongo
6.7.2HATEOAS☁6.7.2.1Filtering6.7.2.2PrettyPrinting6.7.2.3XML
6.7.3ExtensionstoEve☁6.7.3.1ObjectManagementwithEveandEvegenie6.7.3.1.1Installation6.7.3.1.2Startingtheservice
-
6.7.3.1.3Creatingyourownobjects6.8OPENAPI2.06.8.1OpenAPI2.0Specification☁6.8.1.1TheVirtualClusterexampleAPIDefinition6.8.1.1.1Terminology6.8.1.1.2Specification
6.8.1.2References6.8.2OpenAPIRESTServiceviaIntrospection☁6.8.2.1Verification6.8.2.2Mockservice6.8.2.3Exercise
6.8.3OpenAPIRESTServiceviaCodegen☁6.8.3.1Step1:DefineYourRESTService6.8.3.2Step2:ServerSideStubCodeGenerationandImplementation6.8.3.2.1SetuptheCodegenEnvironment6.8.3.2.2GenerateServerStubCode6.8.3.2.3Fillintheactualimplementation
6.8.3.3Step3:InstallandRuntheRESTService:6.8.3.3.1Startavirtualenv:6.8.3.3.2Makesureyouhavethelatestpip:6.8.3.3.3Installtherequirementsoftheserversidecode:6.8.3.3.4Installtheserversidecodepackage:6.8.3.3.5Runtheservice6.8.3.3.6Verifytheserviceusingawebbrowser:
6.8.3.4Step4:GenerateClientSideCodeandVerify6.8.3.4.1Clientsidecodegeneration:6.8.3.4.2Installtheclientsidecodepackage:6.8.3.4.3UsingtheclientAPItointeractwiththeRESTservice
6.8.3.5TowardsaDistributedClientServer6.9Exercises☁
7GRAPHQL☁7.1Prerequisites7.1.1InstallGraphene7.1.2InstallDjango7.1.3InstallGraphiQL
7.2GraphQLtypesystemandschema7.2.1TypeSystem
-
7.2.2ScalarTypes7.2.3EnumerationTypes7.2.4Interfaces7.2.5UnionTypes
7.3GraphQLQuery7.3.1Fields7.3.2Arguments7.3.3Fragments7.3.4Variables7.3.5Directives7.3.6Mutations7.3.7QueryValidation
7.4GraphQLinPython7.5DevelopingyourownGraphQLServer7.5.1GraphQLserverimplementation7.5.2GraphQLServerQuerying7.5.3Mutationexample7.5.4GraphQLAuthentication7.5.5JSONWebTokenAuthentication7.5.5.1UsingAuthenticationwithCurl7.5.5.2ExpirationofJWTtokens
7.5.6GitHubAPIv47.6DynamicQuerieswithGraphQL7.7AdvantagesofUsingGraphQL7.8DisadvantagesofUsingGraphQL7.9Conclusion7.9.1Resources
7.10Excersises8HYPERVISOR8.1Virtualization☁8.1.1VirtualMachines8.1.2SystemVirtualMachines8.1.3HostedVirtualization8.1.4Summary8.1.5VirtualizationApproches8.1.5.1Fullvirtualization8.1.5.2Paravirtualization
-
8.1.6VirtualizationTechnologies8.1.6.1SelectedHardwareVirtualizationTechnologies8.1.6.2AMD-VandIntel-VT8.1.6.3I/OMMUvirtualization(AMD-ViandIntelVT-d)8.1.6.4SelectedVMVirtualizationSoftwareandTools8.1.6.4.1Libvirt8.1.6.4.2QEMU8.1.6.4.3KVM8.1.6.4.3.1KVMvsQEMU
8.1.6.4.4Xen8.1.6.4.5Hyper-V8.1.6.4.6VMWare
8.1.6.5Parallels8.1.6.5.1VirtualBox8.1.6.5.2Wine–Wineisnotanemulator8.1.6.5.3Comparisonofsometechnologies
8.1.6.6SelectedStorageVirtualizationSoftwareandTools8.1.6.7SelectedNetworkVirtualizationSoftwareandTools
8.2VirtualMachineManagementwithQEMU☁8.2.1InstallQEMU8.2.2CreateaVirtualHardDiskwithQEMU8.2.3InstallUbuntuontheVirtualHardDisk8.2.4StartUbuntuwithQEMU8.2.5EmulateRaspberryPiwithQEMU8.2.6Resources
8.3ManageVMguestswithvirsh☁9IAAS9.1Introduction☁9.2AmazonWebServices☁9.2.1AWSProducts9.2.1.1VirtualMachineInfrastructureasaServices9.2.1.2ContainerInfrastructureasaService9.2.1.3ServerlessComputeusingAWSLambda9.2.1.4ServerlessComputeusingAWSLambda9.2.1.5Storage9.2.1.6Databases
9.2.2Locations
-
9.2.3Creatinganaccount9.2.4AWSCommandLineInterface9.2.4.1Introduction9.2.4.2Prerequisites9.2.4.2.1InstallCLI9.2.4.2.2ConfigureCLI
9.2.5AWSAdminAccess9.2.5.1Introduction9.2.5.2Prerequisites9.2.5.3SettingupadminaccessusingAWSCLI9.2.5.3.1Createanadminsecuritygroup9.2.5.3.2Assignasecuritypolicytothecreatedgroupgrantingfulladminaccess
9.2.6Understandingthefreetier9.2.7ImportantNotes9.2.8IntroductiontotheAWSconsole9.2.8.1StartingaVM9.2.8.1.1Settingupkeypair
9.2.8.2StoppingaVM9.2.9AccessfromtheCommandLine9.2.10AccessfromPython9.2.11Boto9.2.12libcloud
9.3MicrosoftAzure☁9.3.1Products9.3.1.1VirtualMachineInfrastructureasaServices9.3.1.2ContainerInfrastructureasaService9.3.1.3Databases9.3.1.4Networking
9.3.2Registration9.3.3IntroductiontotheAzurePortal9.3.4CreatingaVM9.3.5CreateaUbuntuServer18.04LTSVirtualMachineinAzure9.3.6RemoteaccesstheVirtualMachine9.3.7StartingaVM9.3.8StoppingtheVM9.3.9Exercises
-
9.4WhatisIBMWatsonandwhyisitimportant?☁9.4.1HowcanweuseWatson?9.4.2Creatinganaccount9.4.3Understandingthefreetier
9.5GoogleIaaSCloudServices☁9.5.1CloudComputingServicesandProducts9.5.1.1Overview9.5.1.2AIandMachineLearning9.5.1.3APImanagement9.5.1.4Compute9.5.1.5DataAnalytics9.5.1.6Databases9.5.1.7DeveloperTools9.5.1.8InternetofThings9.5.1.9ManagementTools9.5.1.10MediaandMigration
9.5.2Migration9.5.2.1Networking9.5.2.2Security9.5.2.3Storage9.5.2.4GoogleIaaSExample9.5.2.5GoogleCloudConsoleOverview9.5.2.6UseGCPResources9.5.2.7Projectnavigation9.5.2.8NavigateGoogleCloudServices9.5.2.9Sectionpinning9.5.2.10ViewactivityacrossyourGCPresources9.5.2.11SearchacrossCloudConsole9.5.2.12Getsupportanytime9.5.2.13Manageusersandpermissions9.5.2.14Accessthecommandlinefromyourbrowser
9.5.3CreateaVMExample9.5.3.1Createavirtualmachineinstance9.5.3.2VMinstancespage9.5.3.3Connecttoyourinstance9.5.3.4Runasimplewebserver9.5.3.5Visityourapplication
-
9.5.3.6Cleanup9.6OpenStack☁9.6.1Introduction9.6.2OpenStackArchitecture9.6.3Components9.6.4CoreServices9.6.4.1Nova-Compute9.6.4.2Glance-ImageServices9.6.4.3Swift-ObjectStorage9.6.4.4Cinder-BlockStorage9.6.4.5Neutron-Networking9.6.4.6Horizon-Dashboard9.6.4.7Keystone-IdentityService9.6.4.8Ceilometer-Telemetry9.6.4.9Heat-Orchestration
9.6.5AccessfromPythonandScripts9.6.5.1Libcloud9.6.5.2DevStack
9.7PythonLibcloud☁9.7.1Servicecategories9.7.1.0.1Compute9.7.1.0.2KeyPairManagement9.7.1.0.3BlockStorage
9.7.2Installation9.7.3QuickExample9.7.4Managingyourcloudcredentials9.7.5Workingwithcloudservices9.7.5.1Authenticatingwithcloudproviders9.7.5.1.1AmazonAWS9.7.5.1.2Azure9.7.5.1.2.1AzureClassicDriver9.7.5.1.2.2AzureNewDriver
9.7.5.1.3OpenStack9.7.5.1.4Google
9.7.5.2Invokingservices9.7.5.2.1CreatingNodes9.7.5.2.2ListingNodes
-
9.7.5.2.3StartingNodes9.7.5.2.4StopingNodes
9.7.6CloudmeshCommunityProgramtoManageClouds9.7.7AmazonSimpleStorageServiceS3vialibcloud9.7.7.1Accesskey9.7.7.2CreateanewbucketonAWSS39.7.7.3ListContainers9.7.7.4Listcontainerobjects9.7.7.5Uploadafile9.7.7.6References
9.8AWSBoto ☁9.8.1Botoversions9.8.2BotoInstallation9.8.3Accesskey9.8.4Botoconfiguration9.8.5Botoconfigurationwithcloudmesh9.8.6EC2interfaceofBoto9.8.6.0.1Createconnection
9.8.7ListEC2instances9.8.7.0.1Launchanewinstance9.8.7.0.2Checkrunninginstances9.8.7.0.3Stopinstance9.8.7.0.4Terminateinstance9.8.7.1Rebootinstances
9.8.8AmazonS3interfaceofBoto9.8.8.0.1Createconnection9.8.8.0.2CreatenewbucketinS39.8.8.0.3Uploaddata9.8.8.0.4Listallbuckets9.8.8.0.5Listallobjectsinabucket9.8.8.0.6Deleteobject9.8.8.0.7Deletebucket
9.8.9References9.8.10Excersises
10MAPREDUCE10.1IntroductiontoMapreduce☁10.1.1MapReduceAlgorithm
-
10.1.1.1MapReduceExample:WordCount10.1.2HadoopMapReduceandHadoopSpark10.1.2.1ApacheSpark10.1.2.2HadoopMapReduce10.1.2.3KeyDifferences
10.1.3References10.2HADOOP10.2.1Hadoop☁10.2.1.1HadoopandMapReduce10.2.1.2HadoopEcoSystem10.2.1.3HadoopComponents10.2.1.4HadoopandtheYarnResourceManager10.2.1.5PageRank
10.2.2InstallationofHadoop☁10.2.2.1Releases10.2.2.2Prerequisites10.2.2.3UserandUserGroupCreation10.2.2.4ConfiguringSSH10.2.2.5InstallationofJava10.2.2.6InstallationofHadoop10.2.2.7HadoopEnvironmentVariables
10.2.3HadoopDistributedFileSystem(HadoopHDFS)☁10.2.3.1Introduction10.2.3.2Features10.2.3.3HDFSComponents10.2.3.3.1NameNodeandDataNodes
10.2.3.4Usage10.2.3.4.1JavaClientAPI10.2.3.4.2FSShell
10.2.3.5References10.2.3.6Exercises
10.2.4ApacheHBase☁10.2.4.1Introduction10.2.4.2Features10.2.4.3Configuration10.2.4.4Usage10.2.4.4.1ConnecttoHBase.
-
10.2.4.4.2Createatable10.2.4.4.3Describeatable10.2.4.4.4HBaseMapReducejob
10.2.4.5References10.2.5HadoopVirtualClusterInstallationUsingCloudmesh ☁10.2.5.1CloudmeshClusterInstallation10.2.5.1.1CreateCluster10.2.5.1.2CheckCreatedCluster10.2.5.1.3DeleteCluster
10.2.5.2HadoopClusterInstallation10.2.5.2.1CreateHadoopCluster10.2.5.2.2DeleteHadoopCluster
10.2.5.3AdvancedTopicswithHadoop10.2.5.3.1HadoopVirtualClusterwithSparkand/orPig10.2.5.3.2WordCountExampleonSpark
10.3SPARK10.3.1SparkLectures☁10.3.1.1MotivationforSpark10.3.1.2SparkRDDOperations10.3.1.3SparkDAG10.3.1.4Sparkvs.otherFrameworks
10.3.2InstallationofSpark☁10.3.2.1Prerequisites10.3.2.2InstallationofJava10.3.2.3InstallSparkwithHadoop10.3.2.4SparkEnvironmentVariables10.3.2.5TestSparkInstallation10.3.2.6InstallSparkWithCustomHadoop10.3.2.7ConfiguringHadoop10.3.2.8TestSparkInstallation
10.3.3SparkStreaming☁10.3.3.1StreamingConcepts10.3.3.2SimpleStreamingExample10.3.3.3SparkStreamingForTwitterData10.3.3.3.1Step110.3.3.3.2Step210.3.3.3.3Step3
-
10.3.3.3.4Step410.3.3.3.5step510.3.3.3.6step6
10.3.4UserDefinedFunctionsinSpark☁10.3.4.1Resources10.3.4.2InstructionsforSparkinstallation10.3.4.2.1Linux
10.3.4.3Windows10.3.4.4MacOS10.3.4.5InstructionsforcreatingSparkUserDefinedFunctions10.3.4.5.1Example:Temperatureconversion10.3.4.5.1.1Descriptionaboutdataset10.3.4.5.1.2HowtowriteapythonprogramwithUDF10.3.4.5.1.3Howtoexecuteapythonsparkscript10.3.4.5.1.4Filteringandsorting
10.3.4.6Instructionstoinstallandruntheexampleusingdocker10.4HADOOPECOSYSTEM10.4.1ELASTICMAPREDUCE10.4.1.1AWSElasticMapReduce(AWSEMR)☁10.4.1.1.1Introduction10.4.1.1.2WhyEMR?10.4.1.1.3UnderstandingClustersandNodes10.4.1.1.4Prerequisites10.4.1.1.5CreatingEMRClusterUsingCLI10.4.1.1.5.1CreateSecurityRoles10.4.1.1.5.2Settingupauthentication10.4.1.1.5.3Determinetheapplicablesubnet10.4.1.1.5.4CreatetheEMRcluster10.4.1.1.5.5Checkthestatusofyourcluster10.4.1.1.5.6Terminateyourcluster
10.4.1.1.6CreatingEMRClusterUsingAWSWebConsole10.4.1.1.6.1Setupauthentication10.4.1.1.6.2CreatetheEMRcluster10.4.1.1.6.3ViewstatusandterminateEMRcluster10.4.1.1.6.4SubmitWorktoaCluster10.4.1.1.6.5ProcessingData
10.4.1.1.7AWSStorage
-
10.4.1.1.8CreateEMRinAWS10.4.1.1.8.1Createthebuckets10.4.1.1.8.2CreateKeyPairs
10.4.1.1.9CreateStepExecution–HadoopJob10.4.1.1.10CreateaHiveCluster10.4.1.1.10.1CreateaHiveCluster-Screenshots
10.4.1.1.11CreateaSparkCluster10.4.1.1.11.1CreateaSparkCluster-Screenshots
10.4.1.1.12RunanexampleSparkjobonanEMRcluster10.4.1.1.12.1SparkJobDescription10.4.1.1.12.2CreatingtheS3bucket10.4.1.1.12.3CopyfilestoS310.4.1.1.12.4ExecutetheSparkjobonarunningcluster10.4.1.1.12.5ExecutetheSparkjobwhilecreatingclusters10.4.1.1.12.6ViewtheresultsoftheSparkjob
10.4.1.1.13Conclusion10.4.2TWISTER10.4.2.1Twister2☁10.4.2.1.1Introduction10.4.2.1.2Twister2API’s10.4.2.1.2.1TSetAPI10.4.2.1.2.2TaskAPI
10.4.2.1.3OperatorAPI10.4.2.1.3.1Resources
10.4.2.2Twister2Installation☁10.4.2.2.1Prerequisites10.4.2.2.1.1MavenInstallation10.4.2.2.1.2OpenMPIInstallation10.4.2.2.1.3InstallExtras10.4.2.2.1.4CompilingTwister210.4.2.2.1.5Twister2Distribution
10.4.2.3Twister2Examples☁10.4.2.3.1SubmittingaJob10.4.2.3.2BatchWordCountExample
10.4.3HADOOPRDMA☁10.4.3.1 Launching a Virtual Hadoop Cluster on Bare-metalInfiniBandNodeswithSR-IOVonChameleon
-
10.4.3.2LaunchingVirtualMachinesManually10.4.3.3ExtraInitializationwhenLaunchingVirtualMachines10.4.3.4 Important Note for Tearing Down Virtual Machines andDeletingNetworkPorts
11CONTAINER11.1IntroductiontoContainers☁11.1.1Motivation-Microservices11.1.2Motivation-ServerlessComputing11.1.3Docker11.1.4DockerandKubernetes
11.2DOCKER11.2.1IntroductiontoDocker☁11.2.1.1DockerEngine11.2.1.2DockerArchitecture11.2.1.3DockerSurvey
11.2.2RunningDockerLocally☁11.2.2.1InstillationforOSX11.2.2.2InstallationforUbuntu11.2.2.3InstallationforWindows1011.2.2.4TestingtheInstall
11.2.3Dockerfile☁11.2.3.1Specification11.2.3.2References
11.2.4DockerHub☁11.2.4.1CreateDockerIDandLogIn11.2.4.2SearchingforDockerImages11.2.4.3PullingImages11.2.4.4CreateRepositories11.2.4.5PushingImages11.2.4.6Resources
11.2.5DockerCompose☁11.2.5.1Introduction11.2.5.2Installation11.2.5.2.1InstallonMacOS11.2.5.2.2InstallonLinux11.2.5.2.3InstallonWindows1011.2.5.2.3.1SystemRequirements
-
11.2.5.2.4Testtheinstallation11.2.5.3DockerComposeFileDirectives11.2.5.3.1Configuration11.2.5.3.1.1build11.2.5.3.1.2context11.2.5.3.1.3ARGS11.2.5.3.1.4command11.2.5.3.1.5depends_on11.2.5.3.1.6image11.2.5.3.1.7ports11.2.5.3.1.8volumes
11.2.5.4Usages11.2.5.4.1BuildAServicedependingonMongoDB
11.3DOCKERPAAS11.3.1DockerClusters☁11.3.2DockerSwarm☁11.3.2.1Terminology11.3.2.2CreatingaDockerSwarmCluster11.3.2.3CreateaSwarmClusterwithVirtualBox11.3.2.4InitializetheSwarmManagerNodeandAddWorkerNodes11.3.2.5Deploytheapplicationontheswarmmanager
11.3.3DockerandDockerSwarmonFutureSystems☁11.3.3.1GettingAccess11.3.3.2Creatingaserviceanddeploytotheswarmcluster11.3.3.3Createyourownservice11.3.3.4Publishanimageprivatelywithintheswarmcluster11.3.3.5Exercises
11.3.4HadoopwithDocker☁11.3.4.1BuildingHadoopusingDocker11.3.4.2HadoopConfigurationFiles11.3.4.3VirtualMemoryLimit11.3.4.4hdfsSafemodeleavecommand11.3.4.5Examples11.3.4.5.1StatisticalExamplewithHadoop11.3.4.5.1.1BaseLocation11.3.4.5.1.2InputFiles11.3.4.5.1.3Compilation
-
11.3.4.5.1.4ArchivingClassFiles11.3.4.5.1.5HDFSforInput/Output11.3.4.5.1.6RunProgramwithaSingleInputFile11.3.4.5.1.7ResultforSingleInputFile11.3.4.5.1.8RunProgramwithMultipleInputFiles11.3.4.5.1.9ResultforMultipleFiles
11.3.4.5.2Conclusion11.3.4.6Refernces
11.3.5DockerPagerank☁11.3.5.1Usetheautomatedscript11.3.5.2Compileandrunbyhand
11.3.6ApacheSparkwithDocker☁11.3.6.1PullImagefromDockerRepository11.3.6.2RunningtheImage11.3.6.2.1Runninginteractively11.3.6.2.2Runninginthebackground
11.3.6.3RunSpark11.3.6.3.1RunSparkinYarn-ClientMode11.3.6.3.2RunSparkinYarn-ClusterMode
11.3.6.4ObserveTaskExecutionfromRunningLogsofSparkPi11.3.6.5WriteaWord-CountApplicationwithSparkRDD11.3.6.5.1LaunchSparkInteractiveShell11.3.6.5.2PrograminScala11.3.6.5.3LaunchPySparkInteractiveShell11.3.6.5.4PrograminPython
11.3.6.6DockerSparkExamples11.3.6.6.1K-MeansExample11.3.6.6.2JoinExample11.3.6.6.3WordCount
11.3.6.7InteractiveExamples11.3.6.7.1StopDockerContainer11.3.6.7.2StartDockerContainerAgain11.3.6.7.3RemoveDockerContainer
11.4KUBERNETES11.4.1IntroductiontoKubernetes☁11.4.1.1Whatarecontainers?11.4.1.2Terminology
-
11.4.1.3KubernetesArchitecture11.4.1.4Minikube11.4.1.4.1Installminikube11.4.1.4.2StartaclusterusingMinikube11.4.1.4.3Createadeployment11.4.1.4.4Exposetheservi11.4.1.4.5Checkrunningstatus11.4.1.4.6Callserviceapi11.4.1.4.7TakealookfromDashboard11.4.1.4.8Deletetheserviceanddeployment11.4.1.4.9Stopthecluster
11.4.1.5InteractiveTutorialOnline11.4.2UsingKubernetesonFutureSystems☁11.4.2.1GettingAccess11.4.2.2ExampleUse11.4.2.3Exercises
11.5RunningSingularityContainersonComet☁11.5.1Background11.5.2TutorialContents11.5.3WhySingularity?11.5.4Hands-OnTutorials11.5.5Downloading&InstallingSingularity11.5.5.1Download&UnpackSingularity11.5.5.2Configure&BuildSingularity11.5.5.3Install&TestSingularity
11.5.6BuildingSingularityContainers11.5.6.1UpgradingSingularity
11.5.7CreateanEmptyContainer11.5.8ImportIntoaSingularityContainer11.5.9ShellIntoaSingularityContainer11.5.10WriteIntoaSingularityContainer11.5.11BootstrappingaSingularityContainer11.5.12RunningSingularityContainersonComet11.5.12.1TransfertheContainertoComet11.5.12.2RuntheContaineronComet11.5.12.3AllocateResourcestoRuntheContainer11.5.12.4IntegratetheContainerwithSlurm
-
11.5.12.5UseExistingCometContainers11.5.13UsingTensorflowWithSingularity11.5.14Runthejob11.5.15Resources☁11.5.15.1Tutorialspoint
11.6Exercises☁12SERVERLESS12.1FaaS☁12.1.1Introduction12.1.2ServerlessComputing12.1.3Faasprovider12.1.4Resources12.1.5UsageExamples
12.2AWSLambda☁12.2.1AWSLambdaFeatures12.2.2UnderstandingFunctionlimitations12.2.2.1ExecutionTime12.2.2.2Functionsize
12.2.3UnderstandingthefreeTier12.2.4WritingyourfistLambdafunction12.2.5AWSLambdaUsecases12.2.6AWSLambdaExample
12.3ApacheOpenWhisk☁12.3.1OpenWhiskWorkflow12.3.1.1TheActionandNginx12.3.1.2Controller:TheSystem’sInterface12.3.1.3CouchDB12.3.1.4LoadBalancer12.3.1.5Kafka12.3.1.6Invoker12.3.1.7CouchDBagain
12.3.2SettingUpOpenWhiskLocally12.3.2.1Debuggingquick-start
12.3.3HelloWorldinOpenWhisk12.3.4Creatingacustomaction
12.4Kubeless☁12.4.1Introduction
-
12.4.2Programingmodel12.4.3SystemArchitecture
12.5MicrosoftAzureFunction ☁12.6GoogleCloudFunctions☁12.6.1GoogleCloudFunctionExample
12.7OpenFaaS☁12.7.1OpenFaasComponentsandArchitecture12.7.1.1APIGateway12.7.1.2FunctionWatchdog12.7.1.3OpenFaasCLI12.7.1.4Monitoring
12.7.2OpenFaasinAction12.7.2.1Prerequistics12.7.2.2SingleNodeCluster12.7.2.3DeployOpenFaas12.7.2.4ToRunOpenFaas
12.7.3OpenFaaSFunctionwithPython12.8OpenLamda☁12.8.1SuggestedMaterials12.8.2Development12.8.3OpenLambda12.8.4GettingStarted12.8.4.1InstallDependencies12.8.4.2StartaTestCluster
12.8.5Administration12.8.5.1WritingHandlers12.8.5.2ClusterDirectory
12.8.6Configuration12.8.7Architecture
13MESSAGING13.1MQTT☁13.1.1Introduction13.1.2PublishSubscribeModel13.1.2.1Topics13.1.2.2Callbacks13.1.2.3QualityofService
13.1.3SecureMQTTServices
-
13.1.3.1UsingTLS/SSL13.1.3.2UsingOAuth
13.1.4IntegrationwithOtherServices13.1.5MQTTinProduction13.1.6Installation13.1.6.1MacOSinstall13.1.6.2MacOSAdvancedServiceinstall13.1.6.3Ubuntuinstall13.1.6.4RaspberryPiSetup13.1.6.4.1Broker13.1.6.4.2Client
13.1.7ServerUsecase13.1.8IoTUseCasewithaRaspberryPI13.1.8.1RequirementsandSetup13.1.8.2Results
13.1.9Conclusion13.1.10Exercises
13.2PythonApacheAvro☁13.2.1Download,UnzipandInstall13.2.2Definingaschema13.2.3Serializing13.2.4Deserializing13.2.5Resources
14GO14.1IntroductiontoGoforCloudComputing☁14.1.1Organizationofthechapter14.1.2References
14.2Installation☁14.3EditorsSupportingGo☁14.4GoLanguage☁14.4.1ConcurrencyinGo14.4.1.1GoRoutines(execution)14.4.1.2Channels(communication)14.4.1.3Select(coordination)
14.5Libraries☁14.6GoCMD☁14.6.1CMD
-
14.6.2DocOpts14.7GoREST☁14.7.1Gorilla14.7.2REST,RESTful14.7.3Router14.7.4Fullcode
14.8OpenAPI☁14.8.1InstallfromHomebrew14.8.2servespecificationUI14.8.3validateaspecification14.8.4GenerateaGoOpenAPIserver14.8.5generateaGoOpenAPIclient14.8.6generateaspecfromthesource14.8.7generateadatamodel14.8.8othereditors
14.9CreateanEchoserviceusingSwaggerandGo14.9.1Dependencies14.9.2InitializeaGolangproject14.9.3DefineAPIsandgeneratecodeinGo14.9.4Implementthefunctionality14.9.5Runandtesttheserver14.9.6References
14.10GoCloud☁14.10.1GolangOpenstackClient14.10.2OpenStackfromGo14.10.2.1GohperCloud14.10.2.1.1Authentication14.10.2.1.2Virtualmachines14.10.2.1.3Resources
14.11GoLinks☁14.11.1IntroductoryMaterial14.11.2TheGOLanguage14.11.3HowpopularisGo?14.11.4OpenAPIandGo
14.12Exercises☁15REFERENCES
-
1PREFACE
SatNov2305:21:29EST2019☁
1.1LEARNINGOBJECTIVES☁
LearningObjectives
LearnabouthowwedistributematerialasePub’s.LearnhowtocreateanePubwithourmaterialfromsource.IntroduceelementarynotationsweuseintheePub’s.SeewhocontributedtotheePub’s.
1.2EPUBREADERS☁This document is distributed in ePub format. Every OS has a suitable ePubreader to view the document. Such readers can also be integrated into aWebbrowser so thatwhenyouclickonanePub it is automaticallyopened inyourbrowser. As we use ePubs the document can be scaled based on the user’spreferenceIfyoueverseeacontentthatdoesnotfitonapagewerecommendyouzoomouttomakesureyoucanseetheentirecontent.
Wehavemadegoodexperienceswiththefollowingreaders:
macOSX:Books,whichisabuildinebookreaderWindows10:Microsoftedge,but itmustbe thenewestversion,asolderversionshavebugs.Alternatively,usecalibreLinux:calibre
IfyouhaveaniPadorTabletwithenoughmemory,youmayalsobeabletousethem.
Sometimes you may want to adjust the zoom of your reader to increase ordecrease it.Pleaseadjustyourzoomtoa level that iscomfortable foryou.OnmacOSwithalargermonitor,wefoundthatzoomingoutmultipletimesresults
https://github.com/cloudmesh-community/book/blob/master/chapters/version.mdhttps://github.com/cloudmesh-community/book/blob/master/chapters/preface/learning.mdhttps://github.com/cloudmesh-community/book/blob/master/chapters/preface/reader.mdhttps://www.apple.com/apple-bookshttps://www.microsoft.com/en-us/windows/microsoft-edgehttps://calibre-ebook.com/https://calibre-ebook.com/
-
inverygoodrenderingallowingyou tosee thesourcecodewithouthorizontalscrolling.
1.3CORRECTIONS☁Thematerialcollectedinthisdocumentismanagedin
https://github.com/cloudmesh-community/book/chapters
Incaseyouseeanerroror like tomakeacontributionofyourownsectionorchapter,youcandosoingithubviapullrequests.
Theeasiestway to fixanerror is to read theePubandclickon thecloudsymbolinaheadingwhereyouseetheerror.Thiswillbringyoutoaneditabledocumentingithub.Youcandirectlyfixtheerrorinthewebbrowserandcreatethereapullrequest.Naturally,youneedtobesignedintogithubbeforeyoucaneditandcreateapullrequest.
Asa result contributors andauthorswill be integratedautomaticallynext timewecompilethematerial.Thusevenifyoucorrectedasinglespellingerror,youwillbeacknowledged.
1.4CONTRIBUTORS☁Contributors are sorted by the first letter of their combined Firstname andLastnameandifnotavailablebytheirgithubID.Please,notethattheauthorsareidentifiedthroughgitlogsinadditiontosomecontributorsaddedbyhand.Thegit repository from which this document is derived contains more than thedocuments included in thisdocument.Thusnoteveryone in this listmayhavedirectlycontributedtothisdocument.Howeverifyoufindsomeonemissingthathascontributed(theymaynothaveusedthisparticulargit)pleaseletusknow.Wewilladdyou.Thecontributorsthatweareawareofinclude:
Anand Sriramulu, Ankita Rajendra Alshi, Anthony Duer, Arnav,AverillCate,Jr,BertoltSobolik,BoFeng,BradPope,Brijesh,DaveDeMeulenaere,De’AngeloRutledge,EliyahBenZayin,EricBower,Fugang Wang, Geoffrey C. Fox, Gerald Manipon, Gregor von
https://github.com/cloudmesh-community/book/blob/master/chapters/preface/corrections.mdhttps://github.com/cloudmesh-community/book/chaptershttps://github.com/cloudmesh-community/book/blob/master/chapters/authors.md
-
Laszewski, Hyungro Lee, Ian Sims, IzoldaIU, Javier Diaz, JeevanReddyRachepalli,JonathanBranam,JulietteZerick,KeithHickman,KeliFine,KennethJones,MallikChalla,ManiKagita,MiaoJiang,Mihir Shanishchara, Min Chen, Murali Cheruvu, Orly Esteban,Pulasthi Supun, Pulasthi Supun Wickramasinghe, Pulkit Maloo,Qianqian Tang, Ravinder Lambadi, Richa Rastogi, Ritesh Tandon,SaberSheybani,SachithWithana,SandeepKumarKhandelwal,SheriSanders, Shivani Katukota, Silvia Karim, Swarnima H. Sowani,Tharak Vangalapat, Tim Whitson, Tyler Balson, Vafa Andalibi,VibhathaAbeykoon,VineetBarshikar,YuLuo,ahilgenkamp,aralshi,azebrowski, bfeng, brandonfischer99, btpope, garbeandy,harshadpitkar, himanshu3jul, hrbahramian, isims1, janumudvari,joshish-iu, juaco77, karankotz, keithhickman08, kkp, mallik3006,manjunathsivan, niranda perera, qianqian tang, rajni-cs, rirasto,sahancha, shilpasingh21, swsachith, toshreyanjain, trawat87,tvangalapat,varunjoshi01,vineetb-gh,xianghangmi,zhengyili4321
1.5NOTATION☁Thematerialhereusesthefollowingnotation.Thisisespeciallyhelpful,ifyoucontributecontent,sowekeepthecontentconsistent.
ifyouliketoseethedetailsonhowtocreatetheminthemarkdowndocuments,youwillhavetolookatthefilesourcewhileclickingonthecloudintheheadingoftheNotationsection(Section1.5).Thiswillbringyoutothemarkdowntex,butyouwillstillhavetolookattherawcontenttoseethedetails.
☁or ![Github](images/github.png)
Ifyouclickonthe☁or inaheading,youcangodirectlytothe> document in github that contains the next content. This is >convenient to fixerrorsormakeadditions to thecontent.Thecloudwillbeautomaticallyaddedupon inclusionofanewmarkdown filethatincludesinitsfirstlineasectionheader.
$
https://github.com/cloudmesh-community/book/blob/master/chapters/preface/notation.mdhttps://raw.githubusercontent.com/cloudmesh-community/book/master/chapters/preface/notation.md
-
Contentinbashismarkedwithverbatimtextandadollarsign
[1]
References are indicatedwith a number and are included in the>referencechapter[1].Use it inmarkdownwith> [@las14cloudmeshmultiple].Referencesmustbeaddedtotherefernces.bibfileinBibTexformat.
or
Chaptersmarkedwiththisemojiarenotyetcompleteorhavesomeissuethatweknowabout.Thesechaptersneedtobefixed.Ifyouliketohelpusfixingthissection,pleaseletusknow.Useitinmarkdownwith:o2:orifyouliketousetheimagewith![No](images/no.png).
REST36:02
Example for a video with the ![Video](images/video.png) emoji. Use it inmarkdownwith[![Video](images/video.png)REST36:02](https://youtu.be/xjFuA6q5N_U)
Slides10
Example for slideswith the ![Presentation](images/presentation.png) emoji. Theseslidesmayormaynotincludeaudio.
Slides10
Slideswithoutanyaudio.Theymaybefastertodownload.Useit inmarkdownwith[![Presentation](images/presentation.png)Slides10](TBD).
Asetoflearningobjectiveswiththe![Learning](images/learning.png)emoji.
$Thisisabashtext
https://youtu.be/xjFuA6q5N_U
-
Asectionisreleasewhenitismarkedwiththisemojiinthesyllabus.Useitinmarkdownwith![Ok](images/ok.png).
Indicatesopportunities forcontributions.Use it inmarkdownwith ![Question](images/question.png).
Indicates sections that are worked on by contributors. Use it inmarkdownwith![Construction](images/construction.png).
Sectionsmarkedbythecontributorwiththisemoji![Smiley](images/smile.png)whentheyarereadytobereviewed.
Sectionsthatneedmodificationsareindicatedwiththisemoji![Comment](images/comment.png).
Awarningthatweneedtolookatinmoredetail![Warning](images/warning.png)
Notesareindicatedwithabulb![Idea](images/idea.png)
Otheremojis
Other emojis can be found athttps://gist.github.com/rxaviers/7360908. However, note that emojismaynotbeviewable inother formatsoronallplatforms.Weknowthatsomeemojisdonotshowincalibre,buttheydoshowinmacOS
https://gist.github.com/rxaviers/7360908
-
iBooksandMSEdge
This is the list of emojis that canbe converted to PDF. So if you like a PDF,pleaselimityouremojisto
:cloud:☁ :o2: :relaxed:☺ :sunny:☀ :baseball:⚾ :spades:♠ :hearts:♥ :clubs:♣ :diamonds: ♦:hotsprings:♨:warning:⚠:parking: :a: :b: :recycle:♻:copyright:©:registered:®:tm:™:bangbang:‼:interrobang:⁉:scissors:✂:phone:☎
1.5.1Figures
FigureshaveacaptionandcanberefereedtointheePubsimplewithanumber.WeshowsuchareferencepointerwhilereferringtoFigure1.
Figure1:Figureexample
Figuresmustbewritteninthemdas
Note that the textmustbe inone line andmustnotbebrokenupeven if it islongerthan80characters.Youcanrefertothemwith@fig:code-example.Pleasenoteinorderfornumberingtoworkfigurereferencesmustincludethe#fig:followedbya unique identifier. Please note that identifiersmust be really unique and thatidentifiessuchas#fig:cloudorsimilarsimpleidentifiersareapoorchoiceandwilllikelynotwork.Tocheck,pleaselistalllineswithanidentifiersuchas.
andseeifyouridentifieristrulyunique.
1.5.2Hyperlinksinthedocument
Tocreatehyperlinksinthedocumentotherthanimages,weneedtousepropermarkdownsyntaxinthesource.Thisisachievedwitharefernceforexamplein
![Figureexample](images/code.png){#fig:code-examplewidth=1in}
$grep-R"#fig:"chapters
-
sections headers. Let us discuss the refernce header for this section,e.g.Notation.Wehaveaugmentedthesectionheaderasfollows:#Notation{#sec:notation}
Nowwecanusetherefernceinthetextasfollows:In@sec:notationweexplain...
Itwillberenderedas:InSection1.5weexplain…
1.5.3Equations
Equationscanbewrittenas$$a^2+b^2=c^2$${#eq:pythagoras}
andusedintext:
a2 + b2 = c2 (1)
Itwillrenderas:AsweseeinEquation1.
Theequationnumberisoptional.Inlineequationsjustuseonedollarsignanddonotneedanequationnumber:ThisisthePythagorastheorem:$a^2+b^2=c^2$
Whichrendersas:
ThisisthePythagorastheorem:a2 + b2 = c2.
1.5.4Tables
Tablescanbeplacedintextasfollows:
Asusualmakesurethelabelisunique.Whencompilingitwillresultinanerrorif labels are not unique. Additionally there are several md table generators
:SampleDataTable{#tbl:sample-table}
xyz---------1234542
-
availableontheinternetandmakecreatingtablemoreefficient.
1.6UPDATES☁Asalldocumentsaremanagedingithub,thelistofupdatesisdocumentedinthecommithistoryat
https://github.com/cloudmesh-community/book/commits/master
IncaseyoudoalecturewithuswerecommendthatyoudownloadanewversionooftheePubeveryweek.Thiswayyouaretypicallystayinguptodate.YoucancheckthecommithistoryandidentifyiftheversionoftheePubisolderthanthecommittedversion,ifsowerecommendthatyoudownloadanewversion.
We typically will not make announcements to the class as theGitHub commit history is sufficient and you are responsible tomonitoritaspartofyourclassactivities.
https://github.com/cloudmesh-community/book/blob/master/chapters/preface/updates.mdhttps://github.com/cloudmesh-community/book/commits/master
-
2OVERVIEW☁
LearningObjectives
GainanoverviewwhatcurrentlyisinthisbookReviewthehighlevelgoalsBeawarethatthisbookisnotcompleteandisworkedonaswespeakBeawaretocheckoutthebookonaweeklybasistostayuptodateBe aware that additionalmaterial is distributed in separate books such asLinux,Python,andWritinginMarkdown.Beawarethatbooksyoumaypurchasemayalreadybeoutdatedbythetimeyouorderthem.
In thisbookweprovideanumberofchapters thatwillallowyoutoeasilygetknowledgeincloudcomputingontheoreticalandpracticallevels.
Althoughthefollowingwasoriginallycoveredinthisbook,wedecidedtosplitoutitscontentsastomakethecorecloudengineeringbooksmaller.Incaseyoutakeoneofourclassesusingthebook,weexpectthatyoupickupthematerialcoveredalsoby theseadditionalbooks.Pleasebeaware thatsomeof theclassmaterialisbasedonPythonandLinux.Youwillneednoknowledgeofthemasyoucanpickitupwhilereadingthisbook.
CloudComputingLinuxforCloudComputingPythonforCloudComputingScientificWritingwithMarkdown
Thebookisorganizedasfollows:
DefinitionofCloudComputing
Wewillstartwiththedefinitionofwhatcloudcomputingisandmotivatewhy it is important to not only know technologies such asAI orML or
https://github.com/cloudmesh-community/book/blob/master/chapters/class/516/overview.mdhttps://laszewski.github.io/book/cloud/https://laszewski.github.io/book/linux/https://laszewski.github.io/book/python/https://laszewski.github.io/book/writing/
-
Databases. We present you with evidence that Clouds are absolutelyrelevant to todays technologies.We see furthermore a trend to utilizeAIandMLservicesoninthecloud.Technologiessuchasvirtualmachineandcontainers and Function as a Service are essential to the repertoire of amodernCloudorDataengineer.ThereismorethanML…☺
DataCenter
Thischapterwillexplainyouwhyweneedclouddatacenters,howaclouddata center look likes andwhich environmental impact such data centershave.
Architecture
This chapter will introduce you to the basic architectural features anddesigns of cloud computing.We will discuss architectures for IaaS, andcontrastittootherarchitectures.WewilldiscusstheNISTdefinitionofthecloud and the Cloud Security Alliance Reference Architecture. We willdiscussthemulti-cloudarchitectureintroducedbycloudmeshaswellastheBigDataReferenceArchitecture.
REST
Thischapterwill introduceyou toawayonhowtodefineservices in thecloud that you can easily access via language independent clientAPIs. Itwill introduce you to the fundamental concepts of REST.We will moreimportantly introduce you to OpenAPI that allows you to specify RESTservices via a specification document so you can createAPIs and clientsform the document automatically.Wewill showcase you how to do thatwithflask.
Wewill showcase you on a very popular service such asGitHubhow toeasilyinterfacewithRESTservicesinPython.
GraphQL
In this chapter we will introduce you to GraphQL which allows you toaccessdata throughaquerylanguage.Itallowsclients toeasilyformulatequeries that retrieve desired data. Restrictions to the queries can be
-
formulated to download what is needed. Other features include a typesystem.GithubhasaddedinadditiontoitsRESTservicealsoaGraphQLinterface. You will have the opportunity to explore GraphQl whileinterfacingwithGitHub.
Hypervisors
Virtualization is one of the important technologies that started the cloudrevolution.Itprovidesthebasicunderlyingprinciplesforthedevelopmentandadoptionofclouds.Theconcept,althougholdandalreadyusedintheearly days of computing, has recently been exploited to lead to betterutilizationofserversaspartofdatacenters,butalsothelocaldesktops.
In thischapterwe introduceyou to thebasicconceptsanddistinguish thevariousformsofvirtualization.
WelistvirtualizationframeworkssuchasLibvirt,Qemu,KVM,Xen,andHyper-V. Dependent on your hardware you will be encouraged toexperimentwithoneormoreofthem.
IaaS
In the IaaSchapterwewillbe reviewingmanyof theservicesofferedbyproviders usch as AWS, Azure, Google, and OpenStack that is used bysomeacademiccloudssuchaschameleoncloud.
In additionwewill introduce you to elementary command line tools andprogramstoaccessthisinfrastructure.
Inthissectionwewillalsoprovideyouwithinformationaboutmulticloudmanagement with cloudmesh which makes it extremly easy to switchbetweenanduseservicesfrommultiplecloud.s
Importanttonoteisthattheappendixcontainsveryusefulinformationthataugments this section. This includes a more detailed list of services forsomeIaaSprovidersaswellasinformationonhowtousechameleoncloudwhichhasbeenadaptedbyusforthischapter.
Map/Reduce
-
InthischapterwediscussaboutthebackgroundofMapreducealongwithHadoopandit’scorecomponents.WewillalsointroduceSparktoyouinthissectiontoSpark.youinthissection.
Youwillbepresentedonhowyoucanusethesystemsonasingleresourcesoyoucanexplorethemmoreeasily,butwewillalsoletyouknowhowtoinstallthemonaclusterinprincipal.
We conclude this section with some important Map/Reduce frameworksused as part of the larger Map/Reduce ecosystem such as AWS ElasticMap/Reduce(AWSEMR).ThisalsoincludesadiscussionaboutTwister2which is a version of Map/Reduce that could perform even faster thenSpark.
Infactwehaveheretwosectionsthatneedtobedelineatedabitbetterwhichwehopewecandowithyourhelp.
Container
In the container chapterwewill introduce you to the basic concepts of acontaineranddelineateitfromvirtualmachinesaswehaveintroducedyouearlier.Wewill start thechapterwithan introduction toDockerand thanintroduceyouhowtomanageclusterscapableofrunningmanycontainerswiththehelpofdockerswarmandkubernetes.Toshowcaseyouitsuseonother PaaS and applicationswe even show you how to runHadoopwithdockeraswellashowtoconductapagerankanalysis.Kuberneteswillbediscussedinitsownsection.
As many academic datacenters do run queuing system, we will alsoshowcaseSingularityallowingyoutousecontainerswithinabatchqueuingsystem.
youwillhelpusimprovingthissectionifyouelecttoconductaprojectoncomet.
WeconcludethesectionwithlettingyouknowhowtorunTensorflowviasingularity,
ServerlessComputing
-
Recentlyanewparadigmincloudcomputinghasbeenintroduced.Insteadof using virtual machines or containers functions with limited resourcerequirements are specified that can than be executed on function capableexecutionserviceshostedbycloudproviders.
WewillintroduceyoutothisconceptandshowcaseyousomeexamplesofFaaSservicesandframeworks.
MessagingServices
Many devices in the cloud need to communicatewith each other. In thischapterwelookintohowwecanprovidealternativestoRESTservicesthatprovide messaging capabilities.We will focus onMQTT which is oftenusedtoconnectcloudedgedevicesbetweeneachotherandthecloud.
GO
GoisaprogramminglanguageusedbyGoogleandhasbeenmostfamouslyused to implement Kubernetes. In this chapter we introduce you to theelementaryfeaturesofGoandalsotakeacloserlookonhowwecandefineRESTservices,useOpenAPI,andinterfacewithclouds.
CloudAIServices
AspartoftheclasswewillbeexploringAIservicesthatarearehostedincloudandofferedasservice. If interestedyouwillbeable touse theminyourprojects.AspartofthisclassyouwillalsobedevelopingAIservicesand those can be hosted in the cloud and reused by others.While usingcross-platformspecifications,clientsforJava,Python,Scala,Go,andotherprogramming languages will be automatically created for you. This willallowotherstoreuseyourservices.
-
3DEFINITIONOFCLOUDCOMPUTING☁
LearningObjectives
Comparedifferentdefinitionsofcloudcomputing.ReviewtheHistoryofcloudcomputing.Identifytrends.Thecurrenthot job isdataengineerwhich issoughtaftermore thandatascientists(anewtrend).Youhavechosentherightcourse☺BeTALLLtobesuccessfulincloudcomputing.
Videos:
DefinitionofCloudComputing2019
3.1DEFININGTHETERMCLOUDCOMPUTING
In this presentation we review three definitions of cloud computing. Thisincludesthedefinitionsby
NISTWikipediaGartner
3.2HISTORYANDTRENDS
We review some of the historical aspects that lead to cloud computing andespecially look intomore recent trends.These trendsmotivate thatweneed tolook at enhancements to the traditional Service Model that includeInfrastructure-, Platform- and Software- as a Service. These enhancementsespeciallyaretargetingFunction-,andContainerasaService.
https://github.com/cloudmesh-community/book/blob/master/chapters/cloud/definition.mdhttps://youtu.be/KaQte-2elVo
-
3.3JOBASACLOUD/DATAENGINEER
Welookatsomejobrelated trends thatespeciallyfocuson thenewesthot jobdescription calledDataEngineer. It ismotivated that current job offerings asdata engineer is 13% versus 1% for data scientists. As this class is targetedtowards bringing the engineering component towards the data scientists,computer scientists, and application developer, This class is ideally suited forincreasingyourmarketability.
3.4YOUMUSTBETHATTALLL
We close this class with Gregor’s TALLL principle to succeed in CloudComputing:
YoumustbethatTALLLtosurviveinCloudComputingandBigData
Thisprincipleincludesthefollowingcharacteristics
TrendAwareness(TA)
Weneed tobeawarenotonlywhat iscurrentlya trend,butwhatwillbefuturetrends
LongevityPlanning(L)
We need to be able to reproduce our services and results (e.g. can wereproducethemstillinsixmonth).
LeapDetection(L)
WeneedtobeabletodealwithtechnologyLeaps
LearningWillingness(L)
We need to constantly learn to keep up as technology changes every 6month
Naturallythisprincipalcanbeappliedtootherdisciplines.
-
4DATACENTER
4.1DATACENTER☁
LearningObjectives
Whatisadatacenter.Whatareimportmetrics.What is the difference between a Cloud data center and a traditionaldatacenter.WhatareexamplesofClouddatacenters.
4.1.1Motivation:Data
Beforewegointomoredetailsofadatacenterweliketomotivatewhyweneedthem.Herewestartwithlookingattheamountofdatathatrecentlygotcreatedand provide one ofmanymotivational aspects.Not all datawill or should bestored in data centers. However a significant amount of data will be in suchcenters.
4.1.1.1Howmuchdata?
Oneoftheissueswehaveistocomprehendhowmuchdataiscreated.It’shardtoimagineandputintoaperspectivehowmuchtotaldataiscreatedoverayear,a month, a week, a day or even just an hour. Instead to easily visualize theamountofdataproducedweoftenfindgraphicseasiertocomprehendthatshowshow much data was generated in a minute. Such depictions usually includeexamplesofdatageneratedasapartofpopularcloudservicesortheinternetingeneral.
One such popular depiction isDataNever Sleeps (see Figure 3). It has beenproducedanumberoftimesovertheyearsandisnowatversion7.0releasedin2019.Ifyouidentifyanewerversion,pleaseletusknow.
https://github.com/cloudmesh-community/book/blob/master/chapters/cloud/datacenter.md
-
Observationsfor2019:Itisworthwhiletostudythisimageindetailandidentifysomeof thedata thatyoucanrelate toofserviceyouuse. It isalsoapossibleindicationtostudyotherservices thatarementioned.For thedatafor2019weobserve that a staggering ~4.5Mil google searches are executed everyminutewhich is slightly lower than thenumberofvideoswatchedonyoutube.18Miltextmessagesare sendeveryminute.Naturally thenumbersareaveragesovertime.
-
Figure2:DataNeverSleeps[2]
Incontrast in2017weobserved:A3.8Milgooglesearchesareexecutedeveryminute.Surprisingly theweatherchannelreceivesover18Milforecastrequestswhichisevenhigherthanthe12Miltextmessagessendeveryminute.Youtubecertainlyservingasignificantnumberofusersby4.3Milvideoswatchedeveryminute.
-
Figure3:DataNeverSleeps[3]
Adifferentsourcepublisheswhatishappeningontheinternetinaminute,butwehavebeenabletolocateaversionfrom2018(seeFigure4).Whilesomedataseemsthesame,othersareslightlydifferent.Forexamplethisgraphhasalowercount for Google searches, while the number of text messages send issignificantlyhigherincontrasttoFigure3.
-
Figure4:InternetMinute2018[4]
While reviewing the image from last year from the same author, we find notonlyincreases,butalsodeclines.Lookingatfacebookshowcasesalossof73000loginsperminute.Thislossissubstantial.Wecanseethatfacebookservicesarereplaced by other services that aremore popular with the younger generationwhotendtopickupnewservicesquickly(seeFigure5).
-
Figure5:InternetMinute2017-2018[4]
It is also interesting to compare such trends over a longer period of time (seeFigure6,Figure7).AnexampleisprovidedbylookingatGooglesearches
http://www.internetlivestats.com/google-search-statistics/.
andvisualizedinFigure6.
http://www.internetlivestats.com/google-search-statistics/
-
Figure6:Googlesearchesovertime
-
Figure7:Bigdatatrend.2012[5]
When looking at the trends,manypredict an exponential growth in data.Thistrendiscontinuing.
4.1.2CloudDataCenters
Adatacenterisafacilitythathoststheinformationtechnologyrelatedtoserversand data serving a large number of customers. data centers evolved from theneedtooriginallyhavelargeroomsastheoriginalcomputersfilledintheearlydays of the compute revolution filled rooms. Once multiple computers wereadded to such facilities super computer centers created for research purposes.WiththeintroductionoftheinternetandofferingservicessuchasWebhostinglarge business oriented server rooms were created. The need for increasedfacilitieswasevenacceleratedbythedevelopmentofvirtualizationandserversbeingrentedtocustomersinsharedfacilities.Astheneedofwebhostingstillisimportantbuthasbeentakenoverbyclouddatacenters,thetermsinternetdatacenter,andclouddatacenterarenolongerusedtodistinguishit.Insteadweusetoday just the term data center. There may be still an important differencebetween research data centers offered in academia and industry that focus onprovidingcomputationallypotentclustersfocusonnumericalcomputation.Suchdata centers are typically centered around the governance around a smallernumberofusersthatareeitherpartofanorganizationoravirtualorganization.However,weseethatevenintheresearchcommunitydatacentersnotonlyhost
-
supercomputers,butalsoWebserverinfrastructureandthesedaysevenprivatecloudsthatsupporttheorganizationalusers.Incaseofthelatterwespeakaboutsupportingthelongtailaboutscience.
Thelatterisdrivenbythe80%-20%rule.E.g.20%oftheusersuse80%ofthecompute power. This means that the top 20% of scientists are served by theleadershipclasssupercomputersinthenation,whiletherestareeitherservedbyotherservers,cloudofferingsthroughresearchandpublicclouds.
4.1.3DataCenterInfrastructure
Due to the data and the server needs in the cloud and in research such datacenters may look very different. Some focus on large scale computationalresources,someoncommodityhardwareofferedtothecommunity.Thesizeofthemisalsoverydifferent.Whileasupercomputingcenteraspartofauniversitywasoneofthelargestsuchdatacenterstwodecadesago,theydwarfthecentersnowdeployedbyindustrytoservethelongtailofcustomers.
Ingeneraladatacenterwillhavethefollowingcomponents:
Facility: the entire data centerwill be hosted in a building. The buildingmayhavespecificrequirementsrelatedtosecurity,environmentalconcerns,or even the integration into the local community with for exampleprovidingheattosurroundingresidences.
Support infrastructure: This buildingwill include a significant number ofsupportinfrastructurethataddressesforexamplecontinuouspowersupply,airconditioning,andsecurityForthisreasonyoufindinsuchcenters
UninterruptiblePowerSources(UPS)EnvironmentalControlUnitsPhysicalSecuritySystems
InformationTechnologyEquipment:Naturallythefacilitywillhost theITequipmentincludingthefollowing:
ServersNetworkServices
-
DisksDataBackupServices
Operationsstaff:Thefacilitywillneedtobestaffedwiththevariousgroupsthatsupportsuchdatacenters.Itincludes
ITStaffSecurityandFacilityStaffSupportInfrastructureStaff
Withregards to thenumberofpeopleservingsucha facility it isobviousthat throughautomation isquite low.According to [6] proper data centerstaffingisakeytoareliableoperation(seeFigure8).
AccordingtoFigure8operationalsustainabilitycontainsthreeelementsofoperational sustainability, namely management and operations, buildingcharacteristics,andsitelocation[6].
-
Figure8:DatacenterStaffImpact[6]
Another interesting observation is the root cause of incidents in a data center.Everyonehasprobablyexperienced someoutage, so it is important to identifywheretheycomefrominordertopreventthem.AsweseeinFigure9noteveryerror is caused by an operational issue. External, installation, design, andmanufacturer issues are together the largest issue for datacenter incidents (seeFigure9).FigureOutage.AccordingtotheUptimeInstituteAbnormalIncidentReports(AIRs)database,therootcauseof39%ofdatacenterincidentsfallsintotheoperationalarea[6].
-
Figure9:Datacenteroutage[6]
4.1.4DataCenterCharacteristics
Next we identify a number of characteristics when looking at different datacenters.
VariationinSize:Datacentersrangeinsizefromsmalledge facilities tomegascaleorhyperscalefillinglargewarehouses.
Variationincostperserver:Althoughmanydatacentersstandardizetheircomponents,specializedservicesmaybeofferednotona1Kserver,butona50Kserver.
VariationinInfrastructure:Serversincentersserveavariationofneedsand motivate different infrastructure: Use cases, Web Server, E-mail,MachineLearning,PleasantlyParallelproblem,traditionalsupercomputingjobs.
EnergyCost:Datacentersusealotofenergy.Theenergycostvariesperregion.Amotivationtoreduceenergyuseandcostisalsobeentrendedbyenvironmentalawareness,notonlybytheoperators,butbythecommunityinwhichsuchcentersoperate.
-
Reliability: Although through operational efforts the data center can bemademorereliable,failurestillcanhappen.Examplesare
https://www.zdnet.com/article/microsoft-south-central-u-s-datacenter-outage-takes-down-a-number-of-cloud-services/https://www.datacenterknowledge.com/archives/2011/08/07/lightning-in-dublin-knocks-amazon-microsoft-data-centers-offline
https://techcrunch.com/2012/10/29/hurricane-sandy-attacks-the-web-gawker-buzzfeed-and-huffington-post-are-down/
HenceDataCenterIaaSadvantagesinclude
ReducedoperationalcostIncreasedreliabilityIncreasedscalabilityIncreasedflexibilityIncreasedsupportRapiddeploymentDecrease management: Outsourcing expertise that is not related to corebusiness
Datacenterdisadvantagesinclude
LossofcontroloftheHWLossofcontrolofthedataModelispreferringmanyusersSoftwaretocontrolinfrastructureisnotaccessibleVariationsinperformanceduetosharingIntegrationrequireseffortbeyondloginFailurescanhaveahumongousimpact
4.1.5DataCenterMetrics
One of the most important factor to ensure smooth operation and offering ofservices is toemploymetrics thatwillbeable toprovidesignificant impactingtheoperations.Havingmetricsallowsthestafftomonitorandadapttodynamicsituationsbutalsotoplanoperations.
https://www.zdnet.com/article/microsoft-south-central-u-s-datacenter-outage-takes-down-a-number-of-cloud-services/https://www.datacenterknowledge.com/archives/2011/08/07/lightning-in-dublin-knocks-amazon-microsoft-data-centers-offlinehttps://techcrunch.com/2012/10/29/hurricane-sandy-attacks-the-web-gawker-buzzfeed-and-huffington-post-are-down/
-
4.1.5.1DataCenterEnergyCosts
Oneoftheeasiesttomonitormetricsforadatacenteristhecostofenergyusedtooperateallof theequipment.Energy isoneof the largestcostsadatacenterincurs during its operation as all of the servers, networking, and coolingequipment require power 24/7. For electricity, billing is usually measured intermsofkilowatthours(kWh)andkilowatts(kW).Dependingoncircumstances,theremayalsobecostsforpublicpurposeprograms,costrecovery,andstrandedcosts,buttheyarebeyondthescopeofthisbook.
Toprovideaquickunderstanding, it isbest tounderstand therelationbetweenkilowatthoursandkilowatts.kWhistypicallyreferredtoasconsumptionwhilekW is referred to as demand and it’s important to understand how these twoconceptsrelatetoeachother.Theeasiestanalogytodescribetherelationshipisto thinkofkilowatts(demand)as thesizeofawaterpipewhilekilowatt-hours(consumption) is how much water has passed through the pipe. If a serverrequires1.2kWtooperatethen,afteranhourhaspassed,itwillhaveconsumed1.2kWh.However,iftheserveroperatesat1.2kWfor30minutesandthengoesidleanddropsto0.3kWforanother30minutes,thentotalpowerconsumedwillbe:
kWh = 0.3 * 30/60 + 1.2 * 30/60 = 0.75 (2)
Energy costs for a datacenter, then, are composed of two things: charges forenergyandchargesfordemand.Energyistheamountoftotalenergyconsumedby the datacenter and will be the total kWhmultiplied by the cost per kWh.Demand is somewhat more complicated: it is the highest total consumptionmeasured in a 15minute period.Taking the previous example, if a datacenterhas1,000servers,thetotalenergyconsumptionwouldbe750kWhinthehour,butthedemandchargewouldbebasedoffof1,200kW(or1.2MW).
Thecosts,then,arehowtheutilitycompanyrecoupsitsexpenses:thechargeperkWh is it recouping thegeneration costwhile thekWcharge is recouping thecost of transmission and distribution (T&D). Typically, the demand charge ismuchhigherandwilldependonutilityconstraints-ifautilityischallengedontheT&Dfront, expect thesecosts tobeover$6-$10/kW. If theassumedcost-per-kWhis$0.12andcost-per-kWis$8,thecosttorunourserversforamonthwouldbe:
-
kWh = 0.75 * 24 * 30 * 0.12 * 1000 = 64, 800 (3)
kW = 1.2 * 8 * 1000 = 9, 600 (4)
Thiswould total to$74,400. It’s important tonote that fixingdemandchargescanhaveatremendouspayback:hadtheserverssimplyconsumed750kWoverthecourseofthehour,thenourdemandchargeswould’vebeenhalvedto$4,800whiletheenergycostsremainedthesame.Thisisalsowhyservervirtualizationcanhaveapositiveimpactonenergycosts:byhavingfewerserversrunningatahigherutilization,thedemandchargewilltendtolevelitselfoutas,onaverage,eachserverwillbemore fullyutilized.Forexample, it’sbetter topay for500serversat100%utilizationthan1000serversat50%utilizationeventhoughtheamount of work done is the same since, if the 1,000 servers momentarily alloperate at 100% utilization for even a brief amount of time in a month, thedemandchargeforthedatacenterwillbemuchhigher.
4.1.5.2DataCenterCarbonFootprint
Scientistsworldwidehaveidentifiedalinkbetweencarbonemissionandglobalwarming.Astheenergyconsumptionofadatacenterissubstantial,itisprudenttoestimatetheoverallcarbonemission.SchneiderElectric(formerlyAPC)hasprovidedareportonhowtoestimate theCarbonfootprintofadatacenter[7].Althoughthisreportisalreadyabitolder,itprovidesstillvaluableinformation.Itdefineskeytermssuchas
Carbondioxideemissionscoefficient(carbonfootprint):
With the increasing demand of data, bandwidth and high performancesystems, there is substantial amount of power consumption.This leads tohighamountofgreenhousesgases emission into theatmosphere, releasedduetoanykindofbasicactivitieslikedrivingavehicleorrunningapowerplant.
“Themeasurementincludespowergenerationplustransmissionanddistributionlossesincurredduringdeliveryoftheelectricitytoitspointofuse.”
Data centers in total used 91 billion kilowatt-hours (kWh) of electrical
-
energyin2013,andtheywilluse139billionkWhby2020. Currently,datacentersconsumeupto3percentofallglobalelectricity
productionwhileproducing200millionmetrictonsofcarbondioxide.Since world is moving towards cloud, causing more and more datacentercapacityleadingmoretopowerconsumption.
Peakerplant:
Peakingpowerplants, alsoknownas peaker plants, andoccasionally justpeakers, are power plants that generally run only when there is a highdemand,knownaspeakdemand,forelectricity.Becausetheysupplypoweronlyoccasionally, thepowersuppliedcommandsamuchhigherpriceperkilowatthourthanbaseloadpower.Peakloadpowerplantsaredispatchedincombinationwithbaseloadpowerplants,whichsupplyadependableandconsistent amount of electricity, to meet the minimum demand. Theseplants are generally coal-fired which causes a huge amount of CO2emissions.Apeakerplantmayoperatemanyhoursaday,oritmayoperateonly a few hours per year, depending on the condition of the region’selectricalgrid.Becauseofthecostofbuildinganefficientpowerplant,ifapeakerplant isonlygoing toberunforashortorhighlyvariable time, itdoesnotmakeeconomicsensetomakeitasefficientasabaseloadpowerplant.Inaddition,theequipmentandfuelsusedinbaseloadplantsareoftenunsuitableforuseinpeakerplantsbecausethefluctuatingconditionswouldseverely strain the equipment. For these reasons, nuclear, geothermal,waste-to-energy, coal and biomass are rarely, if ever, operated as peakerplants.
Avoidedemissions:
Emissions avoidance is the most effective carbon management strategyover a multi-decadal timescale to achieve atmospheric CO2 stabilizationand a subsequent decline. This prevents, in the first place, stableunderground carbon deposits from entering either the atmosphere or lessstablecarbonpoolsonlandandintheoceans.
Carbonoffsetsbasedonenergyefficiencyrelyontechnicalefficienciestoreduce energy consumption and therefore reduce CO2 emissions. Suchimprovements are often achieved by introducing more energy efficient
-
lightening,cooking,heatingandcooling systems.Theseare real emissionreductionstrategiesandhavecreatedvalidoffsetprojects.
This typeof carbonoffsetprovidesperhaps the simplestoptions thatwillease the adoption of low carbon practice. When these practices becomegenerallyaccepted (or compulsory), theywillno longerqualifyasoffsetsandfurtherefficiencieswillneedtobepromoted.
CO2(carbondioxide,orcarbon):
Carbondioxideisthemaincauseofthegreenhouseeffect,it isemittedinhuge amount into our atmosphere with a life cycle of almost 100 years.Datacentersemitduring themanufacturingprocessofall thecomponentsthatpopulateadatacenter(servers,UPS,buildingshell,cooling,etc.)andduring operation of data centers (in terms of electricity consumed), themaintenance of the data centers (i.e. replacement of consumables likebatteries, capacitors, etc.), and thedisposalof thecomponentsof thedatacenters at the end of the lifecycle. Until now, power plants have beenallowedtodumpunlimitedamountsofcarbonpollutionintotheatmosphere-noruleswereineffectthatlimitedtheiremissionsofcarbondioxide,theprimary driver of global warming. Now, for the first time, the EPA hasfinalized new rules, or standards, thatwill reduce carbon emissions frompower plants. Known as the Clean Power Plan, these historic standardsrepresentthemostsignificantopportunityinyearstohelpcurbthegrowingconsequencesofclimatechange.
Thedatacenterwillhaveatotalcarbonprofile,thatincludesthemanydifferentaspects of a data center contributing to carbon emissions. This includesmanufacturing,packaging, transportation, storage,operationof thedatacenter,and decommissioning.Thus it is important to notice thatwe not only need toconsidertheoperationbutalsotheconstructionanddecommissionphases.
4.1.5.3DataCenterOperationalImpact
Oneof themainoperational impacts is thecostandemissionsofadatacentercause by running, and cooling the servers in the data center.Naturally this isdependent on the type of fuel that is used to produce the energy. The actual
-
carbonimpactusingelectricitycertainlydependsonthetypeofpowerplantthatis used to provide it.These energy costs and distribution ofwhere the energycomes fromcanoftenbe lookedupbygeographical regionson the internetorform the local energyprovider.Municipal government organizationsmay alsohavesuchinformation.ToolssuchastheIndianaStateProfileandEnergyUse[8].
may provide valuable information to derive such estimates.Correlating a datacenterwithcheapenergyisakeyfactor.ToestimatebothcostsintermsofpriceandcarbonemissionSchneiderprovidesaconvenientCarbonestimatecalculatorbasedonenergyconsumption.
https://www.schneider-electric.com/en/work/solutions/system/s1/data-center-and-network-systems/trade-off-tools/data-center-carbon-footprint-comparison-calculator/tool.htmlhttp://it-resource.schneider-electric.com/digital-tools/calculator-data-center-carbon
Ifwecalculatethetotalcost,weneednaturallyaddallcostsarisingfrombuildandteardownphaseaswellasoperationalupgrades.
4.1.5.4PowerUsageEffectiveness
OneofthefrequentmeasurementsindatacentersthatisusedisthePowerusageeffectivenessorPUEinshort.Itisameasurementtoidentifyhowmuchenergyis ued for the computing equipment versus other energy costs such as airconditioning.
Formallywedefineitas
PUEistheratiooftotalamountofenergyusedbyacomputerdatacenterfacilitytotheenergydeliveredtocomputingequipment.
PUEwaspublishedin2016asaglobalstandardunderISO/IEC30134-2:2016.
TheinverseofPUEisthedatacenterinfrastructureefficiency(DCIE).
ThebestvalueofPUEis1.0.Anydatacentermustbehigherthanthisvalueas
https://www.eia.gov/state/?sid=INhttps://www.schneider-electric.com/en/work/solutions/system/s1/data-center-and-network-systems/trade-off-tools/data-center-carbon-footprint-comparison-calculator/tool.htmlhttp://it-resource.schneider-electric.com/digital-tools/calculator-data-center-carbonhttps://www.iso.org/standard/63451.html
-
officesandothercostsurelywillarisewhenwelookattheformula
PUE =
PUE = 1 +
AccordingtothePUEcalculatorat
https://www.42u.com/measurement/pue-dcie.htm
Thefollowingratingsaregiven
PUE DCIS LevelofEfficiency3.0 33% VeryInefficient2.5 40% Inefficient2.0 50% Average
1.5 67% Efficient1.2 83% VeryEfficient
PUEisaverypopularmetricasitisrelativelyeasytocalculateandprovidesametricthatcaneasilycomparedatacentersbetweeneachother.
Thismetriccomesalsowithsomedrawbacks:
Itdoesnotintegrateforexampleclimatebaseddifferences,suchasthattheenergyuse tocooladatacenter incolderclimates is less than inwarmerclimates. However, this may actually be a good side-effect as this willlikelyresultinlesscoolingneedssandthereforenergycosts.It also forces large data centers with many shared servers in contrast tosmalldatacenterswhereoperationalcostmaybecomerelevant.Itdoesnottakeinconsiderationrecycledenergytoforexampleheatotherbuildingsoutsideofthedatacenter.
HenceitisprudentnottojustlookatthePUEbutalsoatothermetricsthatleadto the overall cost and energy usage of the total ecosystem the data center islocatedin.
Total Facility Energy
IT Equipment Energy
Non IT Facility Energy
IT Equipment Energy
https://www.42u.com/measurement/pue-dcie.htm
-
Already in 2006, Google reported its six data centers efficiency as 1.21 andMicrosoft as 1.22which at that timewere considered very efficient.Howeverover timethesetargethasshiftedandtoday’sdatacentersachievemuchlowervalues. The Green IT Cube in Darmstadt, Germany even reported 1.082.AccordingtoWikipediaanunnamedFortune500companyachievedwith30000SuperMicrobladesaPUEof1.06in2017.
Exercises
E.PUE.1:LowestPUEyoucanfind
What is the lowest PUE you can find. Provide details about thesystemaswellasthedatewhenthePUEwasreported.
4.1.5.5Hot-ColdAisle
To understand hot-cold aisles, one must take a brief foray into the realm ofphysicsandenergy.Specifically,understandinghowatemperaturegradienttriestoequalize.ThemostimportantformulatoknowistheheattransferEquation5.
q = hcA(ta − ts) (5)
Here,q is the amountof heat transferred for agiven amountof time.For thisexample,wewillcalculateitasW/hourasthatis,conveniently,howenergyisbilled.Airmovingatamoderate speedwill transferapproximately8.47WattsperSquareFootperHour.A1Userver is19 incheswideandabout34 inchesdeep.Multiplyingthetwovaluesgivesusacrosssectionof646squareinches,or4.48squarefeet.PluggingthesevaluesintoourEquation5us:
q = 8.47 * 4.48 *(ta − ts)) (6)
This begins to point us towards why hot-cold aisles are important. If weintroducecold air from theACsystem into the sameaisle that the servers areexhausting into, theairwillmixandbegin toaverageout.Forexample, ifourserversareproducingexhaustat100FandourACunitprovides65Fatthesamerate,thentheaverageairtemperaturewillbecome82.5F(assumingbalancedairpressure).Thishasadeleteriouseffectonourservercooling-warmerairtakesheatawayfromwarmersurfacesslowerthancoolerair:
-
1, 328.2 = 8.47 * 4.48 *(100 − 65)
664.0 = 8.47 * 4.48 *(100 − 82.5))
Fromthepreviouslisting,wecanseethata35degreedeltaallowsthecentertodissipate1,300Wattsofwasteheatfroma1Userverwhilea17.5degreedeltaallowsustoonlydissipate664Wattsofenergy.Ifaserverisconsumingmorethan 664 Watts, it’ll continue to get warmer and warmer until it eventuallyreaches a temperature differential high enough to create an equilibrium (orreachesathermalthrottleandbeginstoreduceperformance).
Tocombatthis,engineersdevelopedtheideaofdesignatingalternatingaislesaseitherhotorcold.AllserversinagivenaislearethenorientedsuchthattheACsystemprovides cool air into the cold aislewhere it is drawn inby the serverwhichthenexhaustsitintothehotaislewheretheventilationsystemremovesitfrom the room. This has the benefit of maximizing the temperature deltabetweentheprovidedairandtheserver’sprocessor(s), reducingtheamountofquantityofair thatmustbeprovidedinordertocooltheserverandimprovingoverallsystemefficiency.
SeeFigure10tounderstandhowthehot-coldisleconfigurationissetupinadatacenter.
Figure10:HotColdIsle[9]
-
4.1.5.5.1Containment
While modern data centers employ highly sophisticated mechanisms to be asenergy efficient as possible. One such mechanism which can be seen as aimprovement on top of the Hot-Cold isle arrange is to use either hot islecontainmentorcold islecontainment.Usingacontainmentsystemcanremovetheissuewithfreeflowingair.
As the name somewhat implies in cold air containment, the data centers isdesigned so that only cold air goes into the cold isle, thismakes sure that thesystem only draws in cold air for cooling purposes. Conversely in hot islecontainmentdesign,thehotisleiscontainedsothatthehotaircollectedinthehotisleisdrawnoutbythecoolingsystemandsothatthecoldairdoesnotflowintothehotisles[10].
4.1.5.5.1.1WaterCooledDoors
Anothergoodwayofreducingtheenergyconsumptionistoinstallwatercooleddoorsdirectlyathe rackas shown inFigure11.Cooling even can be activelycontrolled so that in case of idle servers less energy is spend to conduct thecooling.Therearemanyvendorsthatprovidesuchcoolingsolutions.
-
Figure11:ActiveRearDoorlink
4.1.5.6WorkloadMonitoring
4.1.5.6.1WorkloadofHPCintheCloud
Clouds and especially university data centers do not just provide virtualmachinesbutprovidetraditionalsupercomputerservices.ThisincludestheNSFsponsoredXSEDE project. As part of this project the "XDMoD auditing toolprovides,forthefirsttime,acomprehensivetooltomeasurebothutilizationandperformanceofhigh-endcyberinfrastructure(CI),withinitialfocusonXSEDE.Several case studies have shown its utility for providing important metricsregardingresourceutilizationandperformanceofTeraGrid/XSEDEthatcanbeused for detailed analysis and planning as well as improving operationalefficiency and performance. Measuring the utilization of high-endcyberinfrastructure such as XSEDE helps provide a detailed understanding ofhowagivenCIresourceisbeingutilizedandcanleadtoimprovedperformanceof the resource in terms of job throughput or any number of desired jobcharacteristics.
Detailed historical analysis of XSEDE usage data using XDMoD clearlydemonstratesthetremendousgrowthinthenumberofusers,overallusage,andscale"[11].
Having access to a detailed metrics analysis allows users and centeradministrators, as well as project managers to better evaluate the use andutilizationofsuchlargefacilitiesjustifyingtheirexistence(seeFigure12)
https://www.mainlinecomputer.com/t/product-lines/cabinets-and-racks/rear-door-heat-exchangers/chilled-doorr-high-density-rack-cooling-system/
-
Figure12:XDMod:XSEDEMetricsonDemand
Additionalinformationisavailableat
https://open.xdmod.org/7.5/index.html
4.1.5.6.2ScientificImpactMetric
Gregor von Laszewski and Fugang Wang are providing a scientific impactmetrictoXDMoDandXSEDE.Itisaframeworkthat(a)integratespublicationand citation data retrieval, (b) allows scientific impact metrics generation atdifferent aggregation levels, and (c) provides correlation analysis of impactmetrics based on publication and citation data with resource allocation for acomputingfacility.ThisframeworkisusedtoconductascientificimpactmetricsevaluationofXSEDE,andtocarryoutextensivestatisticalanalysiscorrelatingXSEDEallocationsizetotheimpactmetricsaggregatedbyprojectandFieldofScience. This analysis not only helps to provide an indication of XSEDE’Sscientific impact,butalsoprovides insight regardingmaximizing the returnon
https://open.xdmod.org/7.5/index.html
-
investment in terms of allocation by taking into account Field of Science orprojectbasedimpactmetrics.ThefindingsfromthisanalysiscanbeutilizedbytheXSEDE resource allocation committee to help assess and identify projectswith higher scientific impact. Through the general applicability of the novelmetrics we invented, it can also help providemetrics regarding the return oninvestmentforXSEDEresources,orcampusbasedHPCcenters[12].
4.1.5.6.3CloudsandVirtualMachineMonitoring
Although no longer in operation in its original form FutureGrid [13] haspioneered the extensivemonitoring and publication of its virtualmachine andprojectusage.Wearenotawareofacurrentsystemthatprovides this levelofdetailassofyet.However,effortsaspartofXSEDEwithintheXDMoDprojectareunderwayatthistimebutarenotintegrated.
Futuregridprovidedaccess toallvirtualmachineinformation,aswellasusageacross projects. An archived portal view is available at FutureGrid CloudMetrics[13].
http://archive.futuregrid.org/metrics/html/results/2014-Q3/reports/rst/india-All.htmlhttp://archive.futuregrid.org/metrics/html/results/2014-Q3/reports/rst/india-All.html
-
Figure13:FutureGridCloudMetric
Futuregrid offered multiple clouds including clouds based on OpenStack,Eucalyptus,andNimbus.NimbusandEucalyptusaresystemsthatarenolongerusedinthecommunity.OnlyOpenStackistheonlyviablesolutioninadditiontothecloudofferingsbyCometthatdonotusesOpenStack(seeFigure13).
Futuregrid, could monitor all of them and published its result in its Metricsportal.Monitoring theVMs is an important activity as they can identifyVMsthatmay no longer be used (the user has forgotten to terminate them) or toomuchusageofauserorprojectcanbedetectedinearlystages.
Weliketoemphasizeseveralexampleswheresuchmonitoringishelpful:
Assumeastudentparticipatesinaclass,metricsandlogsallowtoidentifystudents that do not use the system as asked for by the instructors. ForexampleitiseasytoidentifyiftheyloggedonandusedVMs.Furthermore
-
thelengthofrunningaVMbaLet us assume a user with willful ignorance does not shut down VMsalthough they are not used because research clouds are offered to us forfree. In fact, this situation happened to us recently while using anothercloud and such monitoring capacities were not available to us (onjetstream).Theusersingle-handedlyuseduptheentireallocationthatwassupposedtobesharedwith30otherusersinthesameproject.Allaccountsof all userswere quasi deactivated as the entire project they belonged toweredeactivated.Duetoallocationreviewprocessesittookabout3weekstoreactivatefullaccess.sedonthetaskstobecompletedcanbecomparedagainstotherstudentmembers.Incommercialcloudsyouwillbechargedmoney.Therefore,itislesslikelythatyouforgettoshutdownyourmachineIn case you useGitHub carelessly andpost your cloudpasswords or anyother passwords in it, you will find that within five minutes your cloudaccount will be compromised. There are individuals on the network thatcleverlymineGitHubforsuchsecuritylapsesandwilluseyourpasswordifyouindeedhavestoredtheminit.InfactGitHub’sdeletionofafiledoesnotdeletethehistory,soasanonexpertdeletingthepasswordformGitHubisnotsufficient.Youwillhavetoeitherdeleteandrewritethehistory,butdefinitelyin thiscaseyouwillneedtoreset thepassword.Monitoringthepubliccloudusage in thedatacenter is importantnotonly inyour regionbut other regions as the password is valid also there and intruders couldhijackandstartservicesinregionsthatyouhaveneverused.
In addition to FutureGrid, we like to point out Comet (see other sections). Itcontains an exception for VM monitoring as it uses a regular batch queuingsystemtomanagethejobs.MonitoringofthejobsisconductedthroughexistingHPCtools.
4.1.5.6.4WorkloadofContainers
Monitoringtoolsforcontainerssuchasforkubernetesarelistedat:
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/
Such tools can be deployed alongside kubernetes in the data center, but will
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/
-
likely have restrictions to its access. They are for those who operate suchservices for example in kubernetes.Wewill discuss this in future sections inmoredetail.
4.1.6ExampleDataCenters
In this sectionwewill be giving some data center exampleswhile looking atsomeofthemayorcloudproviders.
4.1.6.1AWS
AWSfocusesonsecurityaspectsoftheirdatacentersthat includefouraspects[14]:
PerimeterLayerInfrastructureLayerDataLayerEnvironmentalLayer
Theglobalinfrastructure[15]asofJanuary2019includes60AvailabilityZoneswithin 20 geographic Regions. Plans exists to add 12 Availability Zones andfour additional Regions in Bahrain, Hong Kong SAR, Sweden, and a secondAWSGovCloudRegionintheUS(seeFigure14).
https://aws.amazon.com/compliance/data-center/perimeter-layer/https://aws.amazon.com/compliance/data-center/infrastructure-layer/https://aws.amazon.com/compliance/data-center/data-layer/https://aws.amazon.com/compliance/data-center/environmental-layer/https://aws.amazon.com/about-aws/global-infrastructure/
-
Figure14:AWSregions[15]
Amazonstrivestoachievehighavailabilitythroughmultipleavailabilityzones,improvedcontinuitywithreplicationbetweenregions,meetingcomplianceanddata residency requirements as well as providing geographic expansion. SeeFigure15
Theregionsandnumberofavailabilityzonesareasfollows:
Region US East: N. Virginia (6), Ohio (3) US West N. California (3),Oregon(3)Region: Asia Pacific Mumbai (2), Seoul (2), Singapore (3), Sydney (3),Tokyo(4),Osaka-Local(1)1CanadaCentral(2)ChinaBeijing(2),Ningxia(3)Region: Europe Frankfurt (3), Ireland (3), London (3), Paris (3) SouthAmericaSãoPaulo(3)RegionGovCloud:AWSGovCloud(US-West)(3)New Region (coming soon): Bahrain, Hong Kong SAR, China, Sweden,AWSGovCloud(US-East)
4.1.6.2Azure
Azure claims to havemoreglobal regions[16] than any other cloud provider.
https://azure.microsoft.com/en-us/global-infrastructure/regions/
-
Theymotivatethisbytheiradvertisementtobringandapplicationstotheusersaroundtheworld.Thegoalissimilarasothercommercialhyper-scaleprovidersby introducing preserving data residency, and offering comprehensivecompliance and resilience. As of Aug 29, 2018 Azure supports 54 regionsworldwide.These regions can currently be accessed by users in 140 countries(seeFigure15).Not every service is offered in every region as the service toregionmatrixshows:
https://azure.microsoft.com/en-us/global-infrastructure/services/
Figure15:Azureregions[16]
4.1.6.3Google
FromGoogle [17] we find that on Aug. 29th Google has the following datacenterlocations(seeFigure16):
NorthAmerica:BerkeleyCounty,SouthCarolina;CouncilBluffs, Iowa;Douglas County, Georgia; Jackson County, Alabama; Lenoir, NorthCarolina;MayesCounty,Oklahoma;MontgomeryCounty,Tennessee;TheDalles,OregonSouthAmerica:Quilicura,ChileAsia:ChanghuaCounty,Taiwan;SingaporeEurope: Dublin, Ireland; Eemshaven, Netherlands; Hamina, Finland; StGhislain,Belgium
https://azure.microsoft.com/en-us/global-infrastructure/services/https://www.google.com/about/datacenters/inside/locations/index.html
-
Figure16:Googledatacenters[17]
Each data center is advertised with a special environmental impact such as auniquecoolingsystem,orwildlifeonpremise.Google’sdatacenterssupportitsservice infrastructure and allow hosting as well as other cloud services to beofferedtoit’scustomers.
Googlehighlightsitsefficiencystrategyandmethodshere:
https://www.google.com/about/datacenters/efficiency/
Theysummarizetheiroffersarebasedon
MeasuringthePUEManagingairflowAdjustingthetemperatureUsefreeCoolingOptimizingthepowerdistribution
https://www.google.com/about/datacenters/efficiency/
-
Figure17:PUEdataforalllarge-scaleGoogledatacenters
ThePUE[18]dataforalllarge-scaleGoogledatacentersisshowninFigure17
An important lesson fromGoogle is the PUE boundary. That is the differentefficiencybasedontheclosenessoftheITinfrastructuretotheactualdatacenterbuilding.ThisindicatesthatitisimportanttotakeatanyprovidersdefinitionofPUE in order not to report numbers that are not comparable between othervendorsandareallencompassing.
Figure18:GoogledatacenterPUEmeasurementboundaries[18]
Figure 18 shows the Google data center PUE measurement boundaries. TheaveragePUE[18]forallGoogledatacentersis1.12,althoughwecouldboastaPUEaslowas1.06whenusingnarrowerboundaries.
https://www.google.com/about/datacenters/efficiency/internal/https://www.google.com/about/datacenters/efficiency/internal/
-
Asaconsequence,GoogleisdefiningitsPUEindetailinEquation7.
PUE = (7)
wheretheabbreviationsstandfor
ESIS=Energyconsumptionforsupportinginfrastructurepowersubstationsfeeding the cooling plant, lighting, office space, and some networkequipmentEITS = Energy consumption for IT power substations feeding servers,network,storage,andcomputerroomairconditioners(CRACs)ETX=MediumandhighvoltagetransformerlossesEHV=HighvoltagecablelossesELV=LowvoltagecablelossesEF=Energyconsumptionfromon-site fuels includingnaturalgas&fueloilsECRAC=CRACenergyconsumptionEUPS=Energylossatuninterruptiblepowersupplies(UPSes)whichfeedservers,network,andstorageequipmentENet1=Networkroomenergyfedfromtype1unitsubstitution
Formoredetailssee[18].
4.1.6.4IBM
IBMmaintains almost 60data centers,which are placedglobally in 6 regionsand18availabilityzones.IBMtargetsbusinesseswhileofferinglocalaccesstoitscenterstoallowforlowlatency.IBMstatesthattroughthislocalizationuserscan decide where and how data and workloads and address availability, faulttoleranceandscalability.AsIBMisbusinessorienteditalsostressesitscertifiedsecurity.
Moreinformationcanbeobtainedfrom:
https://www.ibm.com/cloud/data-centers/
AspecialserviceofferingisprovidedbyWatson.
ESIS + EIT S + ETX + EHV + ELV + EFEIT S − ECRAC − EUPS −ELV + ENet1
https://www.google.com/about/datacenters/efficiency/internal/https://www.ibm.com/cloud/data-centers/
-
https://www.ibm.com/watson/
which is focusing on AI based services. It includes PaaS services for deeplearning, but also services that are offered to the healthcare and othercommunitiesasSaaS
4.1.6.5XSEDE
XSEDE is anNSFsponsored largedistributed setof clusters, supercomputers,dataservices,andclouds,buildinga“singlevirtualsystemthatscientistscanuseto interactivelysharecomputing resources,dataandexpertise”.TheWebpageofXSEDEislocatedat
https://www.xsede.org/
Primarycomputeresourcesarelistedintheresourcemonitorat
https://portal.xsede.org/resource-monitor
ForcloudComputingthefollowingsystemsareofespecialimportancealthoughselected othersmay also host container based systemswhile using singularity(seeFigure19):
CometvirtualclustersJetstreamOpenStack
https://www.ibm.com/watson/https://www.xsede.org/https://portal.xsede.org/resource-monitor
-
Figure19:XSEDEdistributedresourceinfrastructure
4.1.6.5.1Comet
ThecometmachineisalargerclusterandoffersbaremetalprovisioningbasedonKVMandSLURM.Thusitisauniquesystemthatcanrunatthesametimetraditional super computing jobs such asMPIbasedprograms, aswell as jobsthat utilize virtualmachines.With its availability of >46000 cores it providesone of the larges NSF sponsored cloud environment. Through its ability toprovidebaremetalprovisioningandtheaccesstoInfinibandbetweenallvirtualmachinesitisanidealmachineforexploringperformanceorientedvirtualizationtechniques.
Comethasabout3timesmorecoresthanJetstream.
4.1.6.5.2Jetstream
Jetstream is a machine that specializes in offering a user friendly cloudenvironment. It utilizes an environment called atmosphere that is targetinginexperienced scientific cloud users. It also offers an OpenStack environmentthat isusedbyatmosphereand is forclassessuchasours thepreferredaccess
-
method.Moreinformationaboutthesystemcanbefoundat
https://dcops.iu.edu/
4.1.6.6ChameleonCloud
Chameleon cloud is a configurable experimental environment for large-scalecloud research. It is offering OpenStack as a service including some moreadvancedservicesthatallowexperimentationwiththeinfrastructure.
https://www.chameleoncloud.org/
Anoverviewofthehardwarecanbeobtainedfrom
https://www.chameleoncloud.org/hardware/
4.1.6.7IndianaUniversity
IndianaUniversityhasadatacenterinwhichmanydifferentsystemsarehoused.This includes not only jetstream, but also many other systems. The systemsincludeproduction,business,andresearchclustersandservers.SeeFigure20
Figure20:IUDataCenter
OntheresearchclustersideitoffersKarstandCarbonate:
https://kb.iu.edu/d/bezu(Karst)https://kb.iu.edu/d/aolp(Carbonate)
OneofthespecialsystemslocatedinthedatacenterandmanagedbytheDigital
https://dcops.iu.edu/https://www.chameleoncloud.org/https://www.chameleoncloud.org/hardware/https://kb.iu.edu/d/bezuhttps://kb.iu.edu/d/aolp
-
ScienceCenteriscalledFuturesystems,whichprovidesagreatresourcefortheadvanced students of Indiana University focusing on data engineering.WhilesystemssuchasJetstreamandChameleoncloudspecializeinproductionreadycloud environments, Futuresystems, allows the researchers to experimentwithstate-of-the-art distributed systems environments supporting research. It isavailablewithComet and thus could also serve as an on-ramp to using largerscaleresourcesoncometwhileexperimentingwiththesetuponFuturesystems.
Suchanoffering is logicalasresearchers in thedataengineering trackwant tofurtherdevelopsystemssuchasHadoop,SPark,orcontainerbaseddistributedenvironmentsandnotusethetoolsthatarereleasedforproductionastheydonotallowimprovementstotheinfrastructure.FuturesystemsismanagedandofferedbybytheDigitalScienceCenter.
HenceIUoffersveryimportantbutneededservices
KarstfortraditionalsupercomputingJetstreamforproductionusewithfocusonvirtualmachinesFuturesystems for research experiment environments with access to baremetal.
4.1.6.8ShippingContainers
Afewyearsagodatacentersbuildfromshippingcontainerswereverypopular.ThisincludesseveralmainCloudproviders.Suchprovidershavefoundthattheyare not the best way to develop centers at scale. This includesMicrosoft andGoogleThecurrenttrendhoweveristobuildmegaorhyperscaledatacenters.
4.1.7ServerConsolidation
Oneof thedriving factors in cloud computing and the rise of large scale datacentersistheabilitytouseservervirtualizationtoplacemorethanoneserveronthe same hardware. Formerly the services were hosted on their own servers.Todaytheyaremanagedonthesaehardwarealthoughtheylooktothecustomerlikeseparateservers.
Asaresultwefindthefollowingadvantages:
https://www.datacenterknowledge.com/archives/2016/04/20/microsoft-moves-away-from-data-center-containershttps://blogs.technet.microsoft.com/msdatacenters/2013/04/22/microsofts-itpac-a-perfect-fit-for-off-the-grid-computing-capacity/
-
reduction of administrative and operations cost:While we reduce thenumber of servers and utilize hardware to host multiple on themmanagementcost,space,power,andmaintenancecostarereduced.
betterresourceutilization:Throughloadbalancingstrategiesserverscanbe better utilized while for example increase load so resource idling isavoided.
increased reliability: As virtualized servers can be snapshotted, andmirrored,thesefeaturescanbeutilizedinstrategiestoincreasereliabilityincaseoffailure.
standardization: As the servers are deployed in large scale, theinfrastructureisimplicitlystandardizedbasedonserver,network,anddisk,making maintenance and replacements easier. This also includes thesoftware that is running on such servers (OS, platform and may evenincludeapplications).
4.1.8DataCenterImprovementsandConsolidation
Duetotheimmensenumberofserversindatacenters,aswellastheincreasedworkloadonitsservers,theenergyconsumptionofdatacentersislargenotonlytoruntheservers,buttoprovidethenecessarycooling.Thusit is importanttorevisittheimpactsuchdatacentershaveontheenergyconsumption.Oneofthestudiesthatlookedintothisisfrom2016andispublishedbyLBNL[19]Inthisstudy the data center electricity consumption back to 2000 is analyzed whileusingpreviousstudiesandhistoricalshipmentdata.Aforecastiswithdifferentassumptioniscontraststill2020
FigureEnergyForecastdepicts“anestimateoftotalU.S.datacenterelectricityuse (servers, storage,networkequipment, and infrastructure) from2000-2020”(seeFigure21).
While in “2014 the data centers in theU.S. consumed an estimated70billionkWh” or “about 1.8% of total U.S. electricity consumption”. However, morerecent studies find an increase by about 4% from2010-2014.This contrasts alargederivationfromthe24%thatwereoriginallypredictedseveralyearsago.Thestudyfindsthatthepredictedenergyusewouldbeapproximately73billion
https://cloudfront.escholarship.org/dist/prd/content/qt84p772fc/qt84p772fc.pdf
-
kWhin2020.
Figure21:EnergyForecast[19]
It isclear that theoriginalpredictionof largeenergyconsumptionmotivatedatrendinindustrytoprovidemoreenergyefficientdatacenters.Howeverifsuchenergyefficiencyeffortswouldnotbeconductedorencouragedwewouldseeacompletelydifferentscenario.
Thescenariosareidentifiedthatwillsignificantlyimpacttheprediction:
improvedmanagementincreasesenergy-efficiencythroughoperationalortechnological changes with minimal investment. Strategies includeimprovingtheleastefficientcomponents.
best practices increases the energy-efficiency gains that can be obtainedthrough thewidespread adoption themost efficient technologies and bestmanagement practices applicable to each data center type. This scenariofocusesonmaximizingtheefficiencyofeachtypeofdatacenterfacility.
hyperscale data centers where the infrastructure will be moved fromsmallerdatacenterstolargerhyperscaledatacenters.
4.1.9ProjectNatick
-
To reduce energy consumption in data centers and reduce cost of coolingMicrosofthasdevelopedProjectNatick. To tackle this problemMicrosoft hasbuiltunderwaterdatacenter.Anotherbenefitofthisprojectisthatdatacentercanbedeployedinlargebodiesofwatertoservecustomersresidinginthatareasoithelps to reduce latency by reducing distance to users and therefore increasingdatatransferspeed.Therearetwophasesofthisproject.
Theprojectwasexecutedintwophases.
Phase 1 was executed between August to November 2015. In this phaseMicrosoftwas successfully able to deploy and operate vessel underwater.Thevessel was able to tackle cooling issues and effect of biofouling as well.Biofouling is referred to as the fouling of pipes and underwater surfaces byorganisms