ecen/csci 5593: advanced computer architecture...
Post on 10-Feb-2018
250 Views
Preview:
TRANSCRIPT
ECEN/CSCI5593:AdvancedComputerArchitecture(ACA)CourseSyllabus
Instructor: DanConnorsE-Mail dconnors@colorado.edu
Website: Desire2Learn:https://learn.colorado.edu
I. CourseOverview
Advanced Computer Architecture (ACA) covers advanced topics in computer architecture focusing onmulticore, graphics-processor unit (GPU), and heterogeneous SOC multiprocessor architectures and theirimplementation issues (architect's perspective). A range of levels are explored from deep submicron CMOScharacteristics, microarchitecture, compiler optimization, parallel programming, run-time optimization,performanceanalysis&tuning,faulttolerance,andpower-awarecomputingtechniques.Theobjectiveof the course is toprovide in-depth coverageof current andemerging trends in computer
architecture focusing on performance and the hardware/software interface. The course emphasis is on
analyzing fundamental issues in architecture design and their impact on application performance. To
enable a better understanding of the concepts, hands-on assignments are used to explore issues in
multicoreandGPUarchitecturesystems.Studentshaveoptionsinexploringtheirowninterestsincustom
projectsandassignments.
NewrecordedvideolecturesinSpring2017
New projects in Spring 2017: Students work in groups of up to two people, for projects related to
accelerationandperformance tuningofmachine learning, computervision, anddeep learning. Students
takingthecoursecaninvestigateprojectswithaccesstoNVIDIA,Xilinx,andRaspberryPiresources:
NVIDIAJetsonTX1(http://www.nvidia.com/object/jetson-tx1-module.html)istheworld'sleadingAI
computingplatformforGPU-acceleratedparallelprocessinginthemobileembeddedsystemsmarket.Itshigh-
performance,low-energycomputingfordeeplearningandcomputervisionmakesJetsontheidealsolutionfor
compute-intensiveembeddedprojects.JetsonTX1isasupercomputeronamodulethat'sthesizeofacredit
card.ItfeaturesthenewNVIDIAMaxwell™architecture:GPU1TFLOP/s256-cores,withCPU64-bitARM®A57
CPUsMemory4GBLPDDR4|25.6GB/s
• Projectpotential:Drones&UnmannedAerialVehicles(UAVs),AutonomousRoboticSystems,Mobile
MedicalImaging,IntelligentVideoAnalytics(IVA)
PYNQ-PythonProductivityforXilinxZynqProgrammableHardware
http://www.pynq.io/
PYNQisanopen-sourceprojectfromXilinxthatmakesiteasytodesignembeddedsystemswithZynqAll
ProgrammableSystemsonChips(APSoCs).UsingthePythonlanguageandlibraries,designerscanexploitthe
benefitsofprogrammablelogicandmicroprocessorsinZynqtobuildmorecapableandexcitingembedded
systems.UsingthePythonlanguageandlibraries,designerscanexploitthebenefitsofprogrammablelogicand
microprocessorsinZynqtobuildmorecapableandexcitingembeddedsystems.
PYNQuserscannowcreatehighperformanceembeddedapplicationswith
• parallelhardwareexecution
• hardwareacceleratedalgorithms
• real-timesignalprocessing
• highbandwidthIOandlowlatencycontrol
Zynq-7000AllProgrammableSoCFeatures
• DualARM®Cortex™-A9MPCore™withCoreSight™
• 32KBInstruction,32KBDataperprocessorL1Cache
• 512KBunifiedL2Cache,256KBOn-ChipMemory,630KBoffastblockRAM
• 85Klogiccells(13300logicslices,eachwithfour6-inputLUTsand8flip-flops)
RaspberryPi–ARMLinux-basedEmbeddedSystem(https://www.raspberrypi.org)
• LowTransistorCount
• LowPowerConsumption/HeatProduction
• Usedinmostmobiledevices:phonesandsmalldigitaldevices
• RaspberryPihassimilarrequirementstomobiledevices
II. CoursePrerequisites
Thiscourserequirestheunderstandingofdesignofprocessors,specificallycomputerorganizationandthe
instruction set architecture (ISA): ECEN 4593 (Computer Organization) or an equivalent first course in
computerorganizationanddesign.Studentsshouldalreadyunderstandsomecomputerinstructionsetand
knowhowtodesignacontrolunit,arithmeticunit,memory(cacheandvirtual),andvariousinput/output
interfaces.
III. CourseOutline
1. IntroductiontoComputerDesignandQuantitativePrinciplesofArchitecturePerformanceAnalysis
• Technologyandcomputertrends
• Measuringcomputersystemperformance
• Benchmarksandmetrics
2. InstructionSetPrinciplesandExamples
• ClassificationofInstructionSetArchitectures(ISA)–RISC,CISC,VLIW,EPIC
• Predicatedexecutionandcompiler-controlledspeculation
3. AdvancedMicroarchitectureandInstruction-LevelParallelism
• Superscalarandpipelineoperation
• Instruction-LevelParallelism(ILP)
• Dynamicinstructionscheduling(Tomasulo,scoreboarding,reservationstationdesign)
• Overcomingcontrolhazard-branchprediction(2-bit,two-level)
• Compileroptimizationandanalysis
4. Memory-HierarchyDesign
• Multi-levelcachedesignissues
• Performanceevaluation
• Memoryprefetchingtechniques
5. Thread-LevelParallelism
• Multicoresystems
• Threadcontrolmodels(fine-grained,coarse-grained,hyper-threading)
6. Data-LevelParallelism
• Vectorprocessing
• GraphicsProcessingUnits(GPU)
• NVIDIAarchitecturemodels–Fermi,Tesla,Kepler,Maxwell,Pascal
• CUDA/OpenCLprogramming
7. Performance-tuningandAnalysisofModernApplications
• Run-timeoptimization
• Binaryinstrumentation
• Hardwareperformancemonitoring
• Performancetuning
8. ArchitectureImplementationIssuesandAnalysis
• Power-DynamicVoltageFrequencyScaling(DVFS),Energy-DelayProduct(EDP)
• Architecture physical layer concepts including device&layout, manufacturing constraints,
architectures,defecttolerance,anddesignvariability.
CourseSchedule
WEEK1-Introduction,InstructionSetArchitecture,andPipelines
Topics:
• Descriptionofarchitecture,micro-architectureandinstructionsetarchitectures.
• PipeliningReview-basicconceptofpipelineandtwodifferenttypesofhazards.
• PipelineCPI
• ProcessorPipelineHazards
• ComputerArchitecture&TechTrends
• ProcessorSpeed,Cost,Power
• MeasuringPerformance
• BenchmarksStandards
• IronLawofPerformance
• Moore'sLaw
• Amdahl'sLaw
• Lhadma'sLaw
• Gustafson'slaw
WEEK2-ControlHazards
Topics:
• MispredictionPenalties
• BranchPredictionTechniques
• Two-levelCorrelationPredictors:PAg,GAg
• HybridPredictors
• ReturnAddressStack
• LoopPrediction
• UnderstandingCodeExecutionandCodingPracticesforBranchPrediction
WEEK3andWEEK4–BaseCacheMemory,DynamicExecutionandSuperscalarModel
Topics:
• Cachememorycharacteristics
• InstructionLevelParallelism(ILP)
• Out-of-order execution- common methods used to improve the performance of out-of-order
processorsincludingregisterrenamingandmemorydisambiguation.
• Commonissuesforsuperscalararchitecture.
• Kindsofarchitecturesforout-of-orderprocessors.
WEEK5andWEEK6–VLIW,EPIC,andILPCompilerOptimizationsforArchitectures
Topics:
• TraditionalCompilerOptimization:Peephole,LoopUnrolling,Inter-procedural,andInlining
• CompilerOptimizationforInstructionLevelParallelism(ILP)andProfile-DirectedTechniques
• Out-of-order execution- common methods used to improve the performance of out-of-order
processorsincludingregisterrenamingandmemorydisambiguation.
WEEK7-MulticoreArchitecturesandVector/MultimediaInstructionSets
Topics:
• Simultaneousmultithreaded(SMT)architectures
• SMTArchitectureAlternatives
• SMTarchitecture:OSimpactandadaptivearchitectures
• Multi-coreArchitectures
• SingleInstructionMultipleData(SIMD)
• IntelArchitectureDevelopment:MMX,SSE
• InlineAssemblyandAssemblyIntrinsics
WEEK8thruWEEK13–GraphicsProcessingUnit(GPU)Architecture
Topics:
• NVIDIACUDA/GPUProgrammingModel
• GPUHardwareandParallelCommunication
• GPUFundamentalParallelAlgorithms
• OptimizingGPUPrograms
• TheFrontiersandFutureofGPUComputing
• OpenCL–OpenComputeLanguage
• MobileGPUSystemArchitectureExploration:NVIDIATX1
WEEK14–RuntimeOptimizationandCompilation
Topics:
• DynamiccompilationandCodeTranslations
IV. LearningOutcomes
Astudentwhohassuccessfullycompletedthiscourseshouldbeableto:
1. Analyzevariousperformancecharacteristicsofacomputersystem.
2. Applydigitaldesigntechniquestothemicroarchitectureconstructionofaprocessor.
3. Translateassemblylanguageprogramsto/fromhigh-levellanguagecodesandalgorithms.
4. Analyzehardware&softwaretrade-offstodesigntheinstructionsetarchitecture(ISA)interface.
5. Understandadvancedissuesindesignofcomputerprocessors,caches,andmemory.
6. Analyzeperformancetrade-offsincomputerdesign.
7. Applyknowledgeofprocessordesigntoimproveperformanceinalgorithmsandsoftwaresystems.
8. Acquireexperiencewithtoolsforstatisticalanalysisofinstructionsettrade-offs.
9. GaintheabilitytodevelopparallelGPGPUsolutionsofCUDAandOpenCL
V. RequiredTextandMaterials
HennessyandPatterson,ComputerArchitecture-AQuantitativeApproach,4thor laterEdition(ISBN-13:
978-0123704900ISBN-10:0123704901Edition:4th)-thisisthemaintextbookfortheclass.
VI. Assessment&Assignments
Assignments:Thefollowingprogrammingassignmentsarescheduled:• Pin–Binaryinstrumentationtooltoanalyzeprogrambehaviors
o Choiceofbranchpredictionorcachedesignsimulation.
• CUDAprogramming-Vectoraddition
• CUDAprogramming-Histogramgeneration
• CUDAprogramming-Imagefiltering
ReadingAssignments: There are several technical papers (conferenceproceedings, journal articles, andtechnical reports) assigned through the semester. Reading technical papers in the field of computer
architectureisimperativetounderstandingfuturedirectionsinthefield.Assignmentswillrequirestudents
towritebriefoverviewsoranswertechnicalquestionsaboutthepapersassigned.Subjectmatterfromthe
readingassignmentsarelikelytobecoveredinexams.
FinalExam:Therewilla take-homefinalexamthatcovers theconceptsof thecourse.Theexamproblemsarecloselyrelatedtothelectures,homeworkassignments,andassignedreadings.The
finalexamwillbecumulative,coveringallsubjecttopics.
FinalProject:Therewillbeaprojectforyoutoworkonasanindividualorinagroupoftwopeople.Theprojectwill count as15%of your grade, andwill be a significant amountofwork.The assignment is to
extendthesemesterprojectortoanalyzesomeinterestingdataornewarchitecturefeature.Studentsare
able to write survey papers as a second option to the project. The project will be divided into several
milestones, one checkpoint being a presentation ofwork.Details about the project and schedulewill be
announcedlaterinthesemester.
BasisforFinalGrade
Student’sgradeswillbeassessedbasedontheircompletedhomework,quizzes,project,in-classexams,and
the final exam. Homework assignments are designed to provide active learning for the student by
exercisingthevarioustopicscoveredbythecourse.Examswillbedesignedtoassessthestudent’sability
tomaster thedifferent topicareas, and theiraptitude ineachof the learningoutcomes. Thepercentage
giventoeachassessmentmethodisgivenbyTable1.
Table1.GradeAssessmentAssessment %ofFinalGrade
ReadingAssignments 10%
Assignments&Checkpoints 40%
Project 20%
FinalExam 30%
Total 100%
CoursePolicies
LateWorkPolicy: Homeworkassignmentsmustbe turned inat thebeginningof class,else itwillbe
consideredlate.Astudent’sscorewillbereducedbya20%penaltyforsubmittingwork,onesecondto24
hourslate.
StudentHonorCode:StudentsshouldbefamiliarwiththeCollegeofEngineeringandAppliedSciencesstudenthonorcode.Allhonorcoderuleswillbeadheredtointhisclass.
Appointments:Studentsareencouragedtomakeatleastoneappointmentwiththeprofessorduringthesemester.Appointmentscanbemadebyemail.Studentsareencouragedtoexploreresearchopportunities,
expressingconcerns,offeringsuggestions,andseekingadviceareamongthewelcometopics.
top related