mit international journal of computer science & information technology,...

MIT International Journal of Computer Science & Information Technology, Vol. 2, No. 1, Jan. 2012, pp. (19-24)ISSN 2230-7621 © MIT Publications

19

Multi-Core Paradigm

4. Dual-core and quad core chips possess the ability toincrease performance of many software by almost 100%.

WHY MULTI-CORE?

Multi-core processing offers an immediate and cost effectivetechnology for resolving today’s processor design issues:

1. New 22nm manufacturing process has enabledmanufacturers to develop processes that deliversophisticated technology to customers interested inperformance, flexibility and value.

2. Multi-core processors will help to break through today’ssingle core performance limitations and provide thecapacity to tackle tomorrow’s more advanced software.

3. Current OS such as Windows, Linux, and Solaris arenow capable of benefiting from Multi-core processors.

4. Multi-core processors offer the best platform for enablinglogical, incremental performance increase as marketperformance demands increases.

INTRODUCTION

From the evolution of the micro-processor in 1971,advancements in processor performance have been closelycorrelated with advancements in the clock frequencies at whichthese microprocessor operate. Recently that correlation hasbegan to change. Now most processor manufacturer believethere are easier and more cost effective ways of increasingsystem performance. Several factors account for the decreasedvalue of frequency as a point for enhanced performance:

1. The difference between processor cycle and RAM cycletime has grown so much that any benefit from increasedclock frequency gets annul when cache misses occurs.

2. With increase in frequency, results for more power input,which in turn make it harder to cool.

3. The recent shift from 32 nm to 22 nm process fabricationdoubled the transistor budgets available with chipdesigners. The 22nm circuits include both SRAMmemory and logic circuits to be used on 22 nmmicroprocessors. The chip packs 2.9 billion transistors.

ABSTRACT

Thing that never changes in field of everyday computing are requirement for faster speed and more performance and wedidn’t get satisfied even with new technologies. Each and every new technology and performance advances in processorleads to next level of increased performance demands from consumers. The performance criteria is not just the speed butalso smaller and more powerful devices, longer battery life, quieter desktop PCs, and in the business –better price/performanceper watt and lower cooling costs. Enthusiasts want improvement in productivity, security, multitasking, data protection,games and, many other capabilities.

General consumes, too, will now get hands on greater performance then ever before, which will significantly expand theutility of their home PCs and Digital computing Systems. Multi-core Processors may also have the benefit of offering moreperformance without increasing power requirements, which will translate to greater performance per watt.

Merging two or more powerful computing cores on a single processor opens up a world of new possibilities. The nextgeneration of software applications will be developed accordingly to multi-core processor because of the performance andefficiency they can deliver as compare to single core processor. Whether these applications help professional animationcompanies to produce more realistic movies faster for less money & less time, or to create breakthrough ways to make a PCmore natural and intuitive ,the widespread availability of hardware using multi-core technology will change the computinguniverse.

Vikas BhatnagarCS&IT DepartmentUPTU UniversityMIT, Moradabad

e-mail: [email protected]

Kavita ChaudharyCS&IT DepartmentUPTU UniversityMIT, Moradabad


Sarita ChaudharyCS&IT DepartmentUPTU UniversityMIT, Moradabad



20

5. Multi-threaded S/W applications are already positionedto take advantage of Multi-core processing.

INTEL’S FIRST MULTI-CORE DESIGN

Intel’s initial entry in Multi-core market was Smithfield,Paxville, Dempsey and Presler- all of whom take a ad hocapproach to dual core design. These chip packages containtwo independent processors, sometimes located on singlelarge die, and sometimes on two smaller dice. Communicationbetween the cores was accomplished over the external frontside bus that connects both cores to North Bridge. Figure 1illustrates the highly developed of these approaches—Dempsey with Blackford chipset—that provides a separateFSB for each CPU socket in the system. This simplisticapproach to dual-core architecture greatly shortened Intel’sdevelopment schedules, and allowed to bring its initial dualcore products to market in just couple of months.

Figure 1: Intel Dual Core with Blackford Chipset.

But there were some big deficiencies. First, the FSB, alreadya performance inhibitor in single core processors now wasshared by both cores creating a bottleneck, as shown I Figure 2.Second, inter-process cache snooping, a feature that ensuresthe coherency of cached data, must be performed over theFSB. Thus cache snooping places an incremental load on FSB,which in turn constrains the performance of cache snooping& degrades memory access latency.

Figure 2: FSB Bottleneck

Another design that came into existence in multi-core arena isillustrated in Figure 3 which duplicates most elements of theDotham processor, but shares caches and system interfacefunctions across both cores. Since two cores share a commonL2 cache, they never go to off-chip to ensure cache coherency.Since the two cores shares a FSB interface that serves bothcores requirement, also simplifies bus loading characteristicsfrom a circuit design perspective & facilitate higher speed FSBoperations.

Figure 3: Different L2 Cache with different FSB

Figure 4: Shows the architecture of Intel Core 2 Duo(a) &AMD Athlon 64 X2(b) at the lower level.

MOBILE PROCESSOR TECHNOLOGY

Intel Core micro architecture innovations include:

• Intel® Wide Dynamic Execution

• Intel® Intelligent Power Capability


21

• Intel® Advanced Smart Cache

• Intel® Smart Memory Access

• Intel® Advanced Digital Media Boost

Figure 5 depicts the Intel Core i7 Mobile Processorarchitecture.

Figure 5: Intel’s Core i7 Mobile Processor.

NEW INTEL® SSE4 INSTRUCTIONS

The Penryn family includes the Intel® Streaming SIMDExtensions 4 (SSE4) instructions. Intel SSE4 instructions arethe most significant media instruction set architectureadvancement since 2001. This new instruction set extends theIntel® 64 architecture instruction set architecture to better takeadvantage of Intel’s next-generation 45nm siliconmanufacturing process and expand the performance andcapabilities of Intel® Architecture. Intel SSE4 instructionsdeliver further performance gains for SIMD (single instruction,multiple data) software and will enable Penrynmicroprocessors to deliver superior performance and energyefficiency to a broad range of 32- and 64-bit software.Applications that will benefit include those involving graphics,video encoding and processing, 3-D imaging, and gaming. Theinstructions will also benefit high-performance applicationslike audio, image and data compression algorithms, as well asmany more. The Penryn family’s implementation of Intel SSE4will improve performance by:

1. Adding support for two different vectored 32-bit integermultiply operations.

2. Introducing 8-bit unsigned min/max operations, plus 16-bit and 32-bit signed and unsigned versions.

3. Introducing features to improve the compiler’s abilityto factorize integer and single-precision code moreefficiently

• Blends, Tests and Rounds, and sign/zero extensions,are straightforward replacements for existing lengthyoperations.

• Inserts, Extracts are building blocks to gathers(lookups), scatters, strided loads, and stride stores.

4. Adding highly specialized operations that can providesignificant application level gains in:

• Video encode acceleration functions.

• Floating-point dot product operation (important ingaming and 3D content creation).

• Streaming load instruction (important for videoprocessing, imaging, and applications that share databetween the graphics processor and processor).

In 2008 is the following “tock”, came the Intel’s next brand-new micro-architecture codenamed Nehalem.

Nehalem is a truly dynamic and design scalable micro-architecture enabling it to deliver both performance on demandand optimal price/performance/energy efficiency for each typeof platform.

Nehalem’s dynamic scalability delivers performance ondemand through:

1. Dynamically managed cores, threads, cache, interfaces,and power.

2. Leveraging leading 4-instruction issue Intel Core micro-architecture technology (Intel Core micro-architecture’sability to process up to 4 instructions per clock cycle ona sustained basis as compared to 3 instructions per clockcycle or less for other processors).

3. Simultaneous multi-threading (Intel Hyper-ThreadingTechnology) to enhance performance and energyefficiency.

4. Innovative new Intel® SSE4 and ATA instruction setadditions.

5. Superior multi-level shared cache.

6. Leadership system and memory bandwidth.

7. Performance-enhanced dynamic power management.

Nehalem’s design scalability will enable optimal price/performance/ energy efficiency for each market segmentthrough:

1. New system architecture for next-generation Intelprocessors and platforms.

2. Scalable performance for from one-to-sixteen (or more)threads and from one-to-eight (or more) cores.

3. Scalable and configurable system interconnects andintegrated memory controllers.

4. High-performance integrated graphics engine for clientplatforms.

Just when everybody started thinking the end of ad-hocapproach to multi-core design, Intel came up with Clovertown:the Quad-core Server processor, Figures 6 and 7.


22

Figure 6: depicts Clovertown as two Woodcrest dicecrammed into a single package.

Figure 7: Intel Quad-core Clovertown Server Processor.

The archrival AMD was non-the-less behind Inteldevelopment program, developed Opteron & allowed bothcores to share on-board memory controller & hyper-transportlinks, as shown in Figure 8.

Figure 8: AMD Athlon Architecture.

An IDC report, however, says that once the chipmanufacturing process reaches about 16nm in size, theprocessors won’t be able to control the flow of electrons asthe flow moves through the transistors. This means thattransistors eventually will reach a size where chipmakers canno longer make them smaller. Ever smaller and densertransistors on a chip generate more heat, causing processingerrors. But multi-core processors can improve computingpower and limit some of the problems that shrinking transistorsare causing.

In a system without Pacifica technology, the x86 processorhardware contains no virtualization capabilities. When creatinga virtual machine in this type of system, the virtualizationsoftware must manage the resources between the host OS andthe guest OS. Because this extra layer causes additionaloverhead and complexity, application performance suffers.

With Pacifica running on an AMD dual- or multi-coreprocessor, there would be fewer layers and less complexity,improving application performance. Pacifica would useHypervisor as its virtualization software, which would managethe virtual machines. Hypervisor also would track theavailability of physical hardware, letting applications takeadvantage of the hardware as it becomes available.

Figure 9: Virtualization on AMD 64 with Hypervisor.

TILERA TILE64

Tilera has developed a multicore chip with 64 homogeneouscores set up in a grid, shown in Figure 9. An application that iswritten to take advantage of these additional cores will run farfaster than if it were run on a single core. Imagine having aproject to finish, but instead of having to work on it alone youhave 64 people to work for you. Each processor has its ownL1 and L2 cache for a total of 5MB on-chip and a switch thatcon-nects the core into the mesh network rather than a bus orinterconnect. The TILE64 also includes on-chip memory andI/O controllers. Like the CELL processor, unused tiles (cores)can be put into a sleep mode to further decrease powerconsumption. The TILE64 uses a 3-way VLIW (very longinstruction word) pipeline to deliver 12 times the instructionsas a single-issue, single-core processor. When VLIW iscombined with the MIMD (multiple instruction, multiple data)processors, multiple operating systems can be run


23

simultaneously and advanced multimedia applications suchas video conferencing and video-on-demand can be runefficiently.

Figure 9: Tilera Architecture

LONG-TERM BENEFITS OF MULTI-COREPROCESSORS

Multi-core computers have the ability to run today’sapplications as well as tomorrow’s more complex applications,which means that the hardware will retain its value over time.

The growing complexity of software, as well as the desireof users to run multiple applications at the same time, willaccelerate widespread adoption of multi-core processor-basedsystems. This will give commercial applications the ability tohandle large amounts of data and more users faster and moreefficiently, while consumers will experience richer featuresand more functionality, especially for applications like digitalmedia and digital content creation.

Next-generation software applications will require theperformance capacity provided by multi-core processors.Software destined to break barriers in the user experience,like voice recognition and/or artificial intelligence (AI), willbe possible with multi-core processors.

Expanded roles for PCs due to increased performancecapacity, multi-core processor-based PCs will be leveragedfor new tasks, including serving as the hub for digitalentertainment in the home.

MULTI-CORE PROCESSOR ADOPTION

AMD is taking a lead in promoting software-pricing practicesbased on a per-processor, and not on a per-core, model.Consistent with its long-standing tradition of championingtechnologies that truly benefit customers, AMD is workinghard to ensure multi-core technology is available to customerswho want the best performing computer systems. AMD’sefforts paid off in October 2004 when Microsoft announcedthat its server software, currently licensed on a per-processormodel, will continue to be licensed on a per-processor pricingmodel. This policy set an important precedent that othersoftware vendors are likely to follow, and helps ensure thatthe multi-core computer universe will be cost effective.

Key drivers for multi-core processors in Server/Workstationenvironments include:

Reliance on x86 architecture as the backbone of corporateIT networks is placing performance demands on today’s serversto run a growing list of complex applications.

Data centers’ performance requirements are growing whileat the same time budget and logistical concerns deter physicalexpansion within many enterprises.

Methods such as server consolidation and virtualization tobetter utilize existing resources have become appealing optionsto curtail costs.

Multithreaded applications are expected to be adopted morebroadly in the future. The need for multiprocessor systems isgrowing in new areas.

Security has become a critical issue, requiring new classesof software applications and technologies that are uniquelyserved by multi-core processors.

An increasingly effective approach to providing additionalplatform security is to leverage the power of virtualizationtechnology to segregate trusted applications from un-trustedones.

Increased performance without increased powerconsumption is a critical need.

Corporate IT managers also remain resolute in their needto add performance without increasing the physical footprintfor hardware. Multi-core processor solutions address theseneeds by providing increased performance without increasingpower or physical space requirements.

CONCLUSION

1. Processors bus architecture has plenty of bandwidth tosupport multi-core processors.

2. Memory speed cannot keep up with bus capabilities.

3. Multiple ways to work around memory speed limitations

• Write applications to be multi-threaded

• Shift start times of identical threads

• Increase cache size


24

4. Multi-core processors provide multiple execution coresin a single processors package.

5. Larger caches and shared caches improve performanceby reducing latency to frequently used data.

6. Choose memory implementation to maximize datatransfer.

7. Today’s bus architecture is a high speed interface withplenty of bandwidth for multi-core processors.

REFERENCES

[1] Intel Pressroom release, Stephen L. Smith, Vice President,Director of Group Operations Group Operations DigitalEnterprise Group.

[2] Kelin J. Kuhn, Intel Fellow, Director of Advanced DeviceTechnology Intel Corporation.

[3] O. Beaumont, A. Legrand, and Y. Robert. The master slaveparadigm with heterogeneous processors. In Proceedings ofthe IEEE international conference on cluster computing. Oct.2001.

[4] J. Burns and J.L. Gaudiot. SMT layout overhead andscalability.IEEE Transactions on Parallel and DistributedSystem,13(2), Feb.2002.

[5] Intel.com Articles Reference Number: 318148-003, June2008.

[6] R. Figueiredo and J. Fortes. Impact of heterogeneity on DSMperformance.In Sixth International Symposium on High-Performance Computer Architecture, Jan., 2000.

[7] International Technology Roadmap for Semiconductors, 2003,http://public.itrs.net

[8] J. Hennessy and D.Patterson. Computer Architecrure aQuantitative Approach. Morgan Kaufmann Publishers,In.,2002.

[9] Intel White papers R.M. RamanathanIntel CorporationPrimary ContributorsRon Curry, Srinivas ChennupatyRobert L. Cross, Shihjong KuoMark J. BuxtonIntel Corporation

[10] AMD Athlon™ ProcessorArchitecture, AMD White papers.

mit international journal of computer science & information technology,...

Documents