how multi threaded architecture works in db2 9

How multithreaded architecture works in DB2 9.5An overview

Shashank Kharche, Staff Software Engineer, IBM

Summary: New multithreaded capabilities were introduced inDB2® 9.5 for Linux®, UNIX®, and Windows®, codenamed â€œViper 2." Learn how these new capabilities affect you if you regularly monitor processes or threads, if you need to understand how much memory your database is using, or if you want to simplify mission-critical tasks such as backup, restore, and roll forward. You'll learn how these changes affect configuration parameters, and gain knowledge of the new technology in DB2 9.5.

Tag this!Update My dW interests (Log in | What's this?) Skip to help for Update My dW interests

Date: 17 Jul 2008 Level: Intermediate PDF: A4 and Letter (164KB | 14 pages)Get Adobe® Reader® Also available in: Chinese Russian

Activity: 7058 views Comments: 0 (Add comments)

Average rating (based on 18 votes)

Introduction

In order to understand the new multithreaded capabilities in DB2 9.5, this article starts with a look at the DB2 process model. The entire DB2 process model is controlled by Base System Utilities (BSUs). BSUs allocate memory for the instance and database, intercept and handle signals, and handle exceptions sent to DB2. Figure 1 shows the old DB2 process model for the Linux and UNIX platforms.

http://www.ibm.com/developerworks/data/library/techarticle/dm-0807kharche/#author1

http://www.ibm.com/developerworks/data/library/techarticle/dm-0807kharche/#icomments

http://www.ibm.com/developerworks/ru/library/dm-0807kharche/

http://www.ibm.com/developerworks/cn/data/library/techarticles/dm-0807kharche/

http://www.adobe.com/products/acrobat/readstep2.html

http://download.boulder.ibm.com/ibmdl/pub/software/dw/dm/db2/dm-0807kharche/dm-0807kharche-pdf.pdf

http://www.ibm.com/developerworks/data/library/techarticle/dm-0807kharche/#dwmyinterestaddhelp

http://www.ibm.com/developerworks/data/library/techarticle/dm-0807kharche/#overlay

https://www.ibm.com/developerworks/dwwi/DWAuthRouter?m=loginpage&d=http%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fdata%2Flibrary%2Ftecharticle%2Fdm-0807kharche%2F

http://www.ibm.com/developerworks/data/library/techarticle/dm-0807kharche/

http://www.ibm.com/developerworks/data/library/techarticle/dm-0807kharche/

Figure 1. Old DB2 process model on Linux and UNIX

Figure 2 illustrates the new process model on Linux and UNIX.

Figure 2. New DB2 process model on Linux UNIX

The communication between database servers, clients, and applications is taken care of by a framework. This kind of framework is nothing but the process model used by all DB2 servers. It makes sure that internally used database files won't interfere with user or database applications.

Engine dispatchable units (EDUs) are responsible for performing various tasks such as processing database application requests, reading database log files, and flushing log records from the log buffer to the log files on disk. Typically the DB2 server handles this as a separate EDU per task. Prior to DB2 9.5, most of these EDUs were processed based on UNIX and Linux environments and were thread based on a Windows environment. Now, in 9.5 there is uniformity in the process model of DB2 as EDUs are now thread based on Linux, UNIX, and Windows environments.

Here are some of the advantages of the new memory model:

The new memory model is simpler and more easily configured. See the following entries in the DB2 Information Center:

o Configuring memory and memory heaps

o Memory configuration has been simplified

This model saves resources:Significantly fewer system file descriptors are used. The most obvious distinction between processes and threads is that all threads of a process share the same memory

https://publib.boulder.ibm.com/infocenter/db2luw/v9r5/topic/com.ibm.db2.luw.wn.doc/doc/c0051445.html

https://publib.boulder.ibm.com/infocenter/db2luw/v9r5/topic/com.ibm.db2.luw.admin.dbobj.doc/doc/c0051495.html?resultof=%22instance_memory%22%20

space and system-defined facilities. Facilities include open file handles (file descriptors), shared memory, process synchronization primitives, and current directory. All threads in a process can share the same file descriptors. There is no need to have each agent maintain its own file descriptor table.

Performance is enhanced:Operating systems can generally switch (context switching) faster between threads of the same process than between different process. There is no need to switch address space. Because global memory is shared and almost no new memory must be allocated, creating a thread is simpler and faster than creating a process. Process creation is expensive in terms of processor cycles and memory usage.

There are more automatic and dynamic configurable parameters, so less is required from the DBA.This is covered in the Process model configuration simplification section of this article.

The process model is the same now across all three platforms: Linux, UNIX, and Windows.

Monitor threads with db2pd and mapping it with ps output

Prior to DB2 9.5, on UNIX and Linux environments with the help of the ps system command or the db2_local_ps command, you were able to list all active DB2 EDUs. However, in DB2 9.5 those commands no longer list any EDU threads within the db2sysc process. Therefore, one of the changes DB2 users and DBAs will see when they use an OS command to look at the processes running on the system is that they will see only one process as apposed to several. This is an administrative change you might expect from a DBA perspective.

$ ps -fu db2ins10 UID PID PPID C STIME TTY TIME CMDdb2ins10 1237176 2109662 0 Feb 28 - 0:12 db2acd 0db2ins10 1921136 2109662 0 Feb 28 - 0:14 db2sysc 0db2ins10 2101494 1941686 0 14:22:34 pts/1 0:00 -kshdb2ins10 2420958 2101494 0 15:25:33 pts/1 0:00 ps -fu db2ins10

On AIX:To view all threads of the db2sysc process (PID = 1921136):

Listing 1. View all threads of the db2sysc process on an AIX system $ ps -mo THREAD -p 1921136

USER PID PPID TID ST CP PRI SC WCHAN F TT BND COMMANDdb2ins10 1921136 2109662 - A 0 60 26 * 40401 - - db2sysc 0 - - - 1273899 S 0 60 1 f1000100403674b0 410400 - - -

http://www.ibm.com/developerworks/data/library/techarticle/dm-0807kharche/#process_model_config

- - - 1327331 Z 0 60 1 - c00001 - - - - - - 1392805 Z 0 60 1 - c00001 - - - - - - 1601705 Z 0 60 1 - c00001 - - - - - - 1814627 Z 0 60 1 - c00001 - - - - - - 1851457 S 0 60 1 f1000004f010de00 410400 - - - - - - 1961987 Z 0 60 1 - c00001 - - - - - - 1974311 Z 0 60 1 - c00001 - - - - - - 2023571 S 0 60 1 f100010041b401b0 410400 - - - - - - 2068591 Z 0 60 1 - c00001 - - - - - - 2179161 Z 0 60 1 - c00001 - - - - - - 2187515 Z 0 60 1 - c00001 - - - - - - 2216003 S 0 60 1 - 400400 - - - - - - 2412647 Z 0 60 1 - c00001 - - - - - - 2551911 Z 0 60 1 - c00001 - - - - - - 2592969 Z 0 60 1 - c00001 - - - - - - 2621455 S 0 60 1 f1000100407f7e30 410400 - - - - - - 2658531 S 0 60 1 - 418400 - - - - - - 3031171 Z 0 60 1 - c00001 - - - - - - 3457047 Z 0 60 1 - c00001 - - - - - - 3899477 Z 0 60 1 - c00001 - - - - - - 4157609 Z 0 60 1 - c00001 - - - - - - 4390991 S 0 60 1 - 400400 - - - - - - 4636819 Z 0 60 1 - c00001 - - - - - - 5628153 S 0 60 1 - 400400 - - - - - - 6783009 Z 0 60 1 - c00001 - - -

On Linux:

To view all threads of the db2sysc process (PID = 1921136): ps -lLfp 1921136

The DBA's life is made easier now. Improvements were done in db2pd to list processes and threads. You can now use the db2pd command, with the -edu option, to list all EDU threads that are active. It can be used on UNIX, Linux, and Windows systems.

Listing 2. view all threads of the db2sysc process on a Linux system $ db2pd -edu

Database Partition 0 -- Active -- Up 1 days 01:05:54List of all EDUs for database partition 0

db2sysc PID: 1921136db2wdog PID: 2109662db2acd PID: 1237176

EDU ID TID Kernel TID EDU Name USR SYS===================================================================================1801 1801 2216003 db2agent (idle) 0 0.706935 1.0717371543 1543 5628153 db2resync 0 0.002641 0.0042711286 1286 1851457 db2ipccm 0 0.082388 0.0440371029 1029 2023571 db2licc 0 0.000211 0.001055772 772 4390991 db2thcln 0 0.000244 0.000105515 515 2621455 db2aiothr 0 2.740874 6.2875622 2 1273899 db2alarm 0 0.274076 0.408226258 258 2658531 db2sysc 0 2.085981 1.379128

Back to top

How much memory is used by DB2?

There are few ways to check memory usage:

db2pd -dbptnmem db2 get snapshot for applications on sample

select * from table(admin_get_dbp_mem_usage())

db2mtrk -a and db2mtrk -p

http://www.ibm.com/developerworks/data/library/techarticle/dm-0807kharche/#ibm-pcon

Note the following information:

db2pd shows accurate representation of shared memory hierarchy db2pd still can't report private memory allocations

db2mtrk can report private memory allocations, but weak in other areas

Private memory usage is not so interesting anymore

db2pd -dbpntmem high-level reporting may be sufficient

Using db2pd

Listing 3. Example of db2pd $ db2pd -dbptnmem

Database Partition 0 -- Active -- Up 1 days 01:11:27

Database Partition Memory Controller Statistics

Controller Automatic: YMemory Limit: 13994636 KBCurrent usage: 76608 KBHWM usage: 332736 KBCached memory: 16064 KB

Individual Memory Consumers:

Name Mem Used (KB) HWM Used (KB) Cached (KB)========================================================DBMS-db2ins10 46784 46784 10048FMP_RESOURCES 22528 22528 0PRIVATE 7296 7296 6016

Fields information:

Controller Automatic is set to Y if the INSTANCE_MEMORY configuration parameter is set to AUTOMATIC. This means that database manager automatically determines the upper boundary on memory consumption.

Memory Limit is the DB2 server's upper bound of memory that can be consumed. It is the value of the INSTANCE_MEMORY configuration parameter.

Current usage is the amount of memory the server is currently consuming.

HWM usage is the high water mark (HWM) or peak memory usage that has been consumed since the activation of the database partition when the db2start command was run.

Cached memory is how much of the current usage is not currently being used, but is cached for performance reasons for future memory requests.

Individual Memory Consumers section:

All registered "consumers" of memory within the DB2 server are listed with the amount of the total memory they are consuming.

Name: A brief, distinguishing name of a "consumer" of memory. Examples include:

o APPL-<dbname> for application memory consumed for a database <dbname>

o DBMS-xxx for global database manager memory requirements

o FMP_RESOURCES for memory required to communicate with db2fmps

o PRIVATE for miscellaneous private memory requirements

o FCM_RESOURCES for Fast Communication Manager resources

o LCL-<pid> for memory segment used to communicate with local applications

o DB-<dbname> for database memory consumed for a database <dbname>

Mem Used (KB): How much memory is currently allotted to that consumer.

HWM Used (KB): High-water mark, or peak, memory that the consumer has used.

Cached (KB): Of the Mem Used (KB), the amount of memory that is not currently being used but is immediately available for future memory allocations.

Using db2 get snapshot

Listing 4. Example of db2 get snapshot $ db2 get snapshot for applications on sample

Memory usage for application:

Memory Pool Type = Application Heap Current size (bytes) = 65536 High water mark (bytes) = 65536 Configured size (bytes) = 1048576

Agent process/thread ID = 6463 Agent Lock timeout (seconds) = -1 Memory usage for agent:

Memory Pool Type = Other Memory Current size (bytes) = 196608 High water mark (bytes) = 196608 Configured size (bytes) = 16710107136

Using SQL

Listing 5. Example of using SQL $ db2 "select * from table(admin_get_dbp_mem_usage())"

DBPARTITIONNUM MAX_PARTITION_MEM CURRENT_PARTITION_MEM PEAK_PARTITION_MEM-------------- -------------------- --------------------- -------------------- 0 14330507264 340590592 340852736

1 record(s) selected.

Using db2mtrk

Listing 6. Example of db2mtrk -a $ db2mtrk -aTracking Memory on: 2008/02/29 at 15:51:00

Application Memory for database: SAMPLE appshrh 128.0K

Memory for application 546 apph other 64.0K 192.0K





Listing 7. Example of db2mtrk -p $ db2mtrk -pTracking Memory on: 2008/02/29 at 15:51:37

Memory for agent 6463 other 192.0K

Memory for agent 6206 other 192.0KMemory for agent 5949 other 320.0K



Note: By default, INSTANCE_MEMORY is set to AUTOMATIC, which means that the instance is allowed a maximum of some percentage of RAM (the range is 75 percent for smaller systems and 95 percent for larger systems ). This includes all local partitions for a single instance.

db2 get dbm cfg show detail|grep INSTANCE_MEMORYSize of instance shared memory(4KB)(INSTANCE_MEMORY)=AUTOMATIC(3498659)AUTOMATIC(3498659)

You cannot permanently set different INSTANCE_MEMORY values for different database partitions. For a DB2 Express licenses, the upper bound on INSTANCE_MEMORY is further restricted to at most 4GB of memory (1,048,576 * 4KB pages). DB2 Workgroup licenses are restricted to at most 16GB of memory (4,194,304 * 4KB pages). Attempts to update the INSTANCE_MEMORY configuration parameter to values larger than these limits will fail with a SQL5130N return code, specifying the restricted range allowed for the license. Other license types have no additional restrictions. You cannot set INSTANCE_MEMORY to be more than RAM.

Back to top

Get rid of DPF backup and recovery problems

Each partition gets a different timestamp

On previous DB2 versions:

$ db2_all " db2 backup db test"


Backup successful. The timestamp for this backup image is : 20080304124529eva88: db2 backup db test completed ok



Whereas, in DB2 9.5, the BACKUP command is enhanced to take a list of database partitions, which provides a single system view.

$ db2 backup db test on all dbpartitionnumsPart Result---- -------------------------------0000 DB20000I The BACKUP DATABASE command completed successfully.0010 DB20000I The BACKUP DATABASE command completed successfully.

Backup successful. The timestamp for this backup image is : 20080304135942

How do you determine what log files are required during roll-forward?

$ db2 rollforward db test to 2008-03-01 and stopSQL1275N The stoptime passed to roll-forward must be greater than or equal to "2008-03-04-12.45.54.000000 UTC", because database "TEST" on node(s) "0,1" contains information later than the specified time.

$ db2 rollforward db test to 2008-03-04-12.45.54.000000 and stopDB20000I The ROLLFORWARD command completed successfully.

The above example shows that during roll-forward if point in time (PIT) specified in the command is old or early, you get error message (SQL1275N). The error message tells you about the correct PIT. You might consider using BACKUP with INCLUDE logs. However, in a DPF database, BACKUP with INCLUDE logs generates error message (SQL2032N). Therefore, you cannot use this option.

Whereas, in DB2 9.5 you can use the "TO END OF BACKUP" clause with the ROLLFORWARD command to roll forward all partitions in a partitioned database to the minimum recovery time. The minimum recovery time is the earliest point in time during a roll-forward when a database is consistent (when the objects listed in the database catalogs match the objects that physically exist on disk). Manually determining the correct point in time to which to roll forward a database is difficult, particularly for a partitioned database. The "END OF BACKUP" option makes it easy.

$ db2 rollforward db test to end of backup and stopDB20000I The ROLLFORWARD command completed successfully

Back to top


What's important about user limits?

User limits set or show various restrictions on resource usage for a shell. It's a good practice to set some of these limitations to prevent such issues as a faulty shell script to start unlimited copies of itself or to prevent users on the system to start processes that run forever. But, what to set it to? Below are the few considerations for various restrictions on resources:

At db2 start, data and nofiles are unlimited. The stack limit is irrelevant because DB2 creates its own stack space

(AGENT_STACK_SZ dbm cfg)

On 64-bit UNIX Default : 4 MBMinimum : 1 MBMaximum : 128 MB

On 32-bit LINUX Default : 1MBMinimum : 64 KBMaximum : 4MB

MAXFILOP is the maximum per database per partition. New high defaults of ~32K for 32-bit and ~64K for 64-bit.

Current ulimit setting (or 8GB on AIX if ulimit is set to unlimited). DB2 overrides an unlimited core limit. In order to get a core larger than 8GB, you have to explicitly set the core limit to something larger than 8GB, but not unlimited.

Back to top

Process model configuration simplification

In this section you will see how the configuration parameters behave differently in DB2 9.5. Take note of the default values and ranges, as they are different than before.


Figure 3. Configuration parameters

If you have performance-critical unfenced, external stored procedures (SPs) or user-defined functions (UDFs), ensure they are thread-safe. Upon migration, all external, unfenced SPs and UDFs, will become fenced.For data integrity, by convention, unfenced SPs and UDFs should already be thread-safe, but this cannot be enforced. Running a non-thread-safe SP or UDF in a multi-threaded process could cause unpredictable problems. Therefore, as a migration procedure, create a script to facilitate the conversion to unfenced.

Before taking Schooner, have a quick look over newly introduced threads and processes:

db2thcln (thread stack cleanup): Recycles resources when an EDU terminates (UNIX-only).

db2aiothr (aio collector thread): Manages asynchronous I/O requests for the database partition (UNIX-only).

db2alarm (alarm thread): Notifies EDUs when their requested timer has expired (UNIX-only).

db2vend (fenced vendor process): Executes vendor code on behalf of an EDU, for instance to execute the user-exit program for log archiving (UNIX-only).

db2extev (external event handler thread): The same as SIGUSR2.

db2acd: A health monitor process.

Finally, does it impact the applications that you currently have if you are moving up DB2 9.5?The answer to this question is NO, absolutely NOT. The internal change this does not affect the application at all. In fact, it is largely transparent from an administration and application programming prospective.

Back to top

Acknowledgements

Special thanks to Amar Thakkar and Samir Kapoor for their technical review of this article.

Resources

Learn

IBM DB2 9.5 Information Center for Linux, UNIX and Windows : Find information describing how to use the DB2 family of products and features, as well as related WebSphere® Information Integration products and features.

IBM DB2 Express-C 9.5 : Download DB2 Express-C 9.5, a no-charge version of DB2 Express 9 database server.

IBM DB2 Training and Certification : Find award winning instructors, industry leading software, hands-on labs, and more.

DB2 for Linux, UNIX, and Windows Forum : Share questions, thoughts, and ideas with others DB2 users and developers.

Stay current with developerWorks technical events and webcasts.

developerWorks Information Management zone : Learn more about DB2. Find technical documentation, how-to articles, education, downloads, product information, and more.

Get products and technologies

Build your next development project with IBM trial software, available for download directly from developerWorks.

http://www.ibm.com/developerworks/downloads/

http://www.ibm.com/developerworks/db2/

http://www.ibm.com/developerworks/offers/techbriefings/

http://www.ibm.com/developerworks/forums/forum.jspa?forumID=291

http://www-306.ibm.com/software/data/education/

http://www.ibm.com/developerworks/downloads/im/udbexp/learn.html

https://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp


Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2, Lotus®, Rational, Tivoli®, and WebSphere.

Discuss

Participate in the discussion forum .

Participate in developerWorks blogs and get involved in the developerWorks community.

About the author

Shashank Kharche is a staff software engineer with the IBM Australia Development Lab in Sydney, Australia. He is an IBM certified DB2 administrator. Shashank currently works as part of the Down Systems Division, Asia Pacific region, and has widespread experience in DB2 database and the diagnosis and resolution of critical problems. He has published several technotes for IBM. He holds a Bachelor's degree in Computers Science and Engineering. You can reach him at [email protected].

Simultaneous multithreadingFrom Wikipedia, the free encyclopedia

This section may require cleanup to meet Wikipedia's quality standards. Please improve this section if you can. The talk page may contain suggestions. (November 2007)

Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.

http://en.wikipedia.org/wiki/CPU_design

http://en.wikipedia.org/wiki/CPU_design

http://en.wikipedia.org/wiki/Thread_(computer_science)

http://en.wikipedia.org/wiki/Multithreading_(computer_hardware)

http://en.wikipedia.org/wiki/Central_processing_unit

http://en.wikipedia.org/wiki/Superscalar

http://en.wikipedia.org/wiki/Talk:Simultaneous_multithreading

http://en.wikipedia.org/w/index.php?title=Simultaneous_multithreading&action=edit

http://en.wikipedia.org/w/index.php?title=Simultaneous_multithreading&action=edit

http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style

http://en.wikipedia.org/wiki/Wikipedia:Cleanup

mailto:[email protected]

http://www.ibm.com/developerworks/blogs/

http://www.ibm.com/developerworks/forums/forum.jspa?forumID=842

http://www.ibm.com/developerworks/downloads/?S_TACT=105AGX01&S_CMP=ART

Contents[hide]

1 Details 2 Taxonomy

3 Historical implementations

4 Modern commercial implementations

5 Disadvantages

6 See also

7 References

8 External links

[edit] Details

Multithreading is similar in concept to preemptive multitasking but is implemented at the thread level of execution in modern superscalar processors.

Simultaneous multithreading (SMT) is one of the two main implementations of multithreading, the other form being temporal multithreading. In temporal multithreading, only one thread of instructions can execute in any given pipeline stage at a time. In simultaneous multithreading, instructions from more than one thread can be executing in any given pipeline stage at a time. This is done without great changes to the basic processor architecture: the main additions needed are the ability to fetch instructions from multiple threads in a cycle, and a larger register file to hold data from multiple threads. The number of concurrent threads can be decided by the chip designers, but practical restrictions on chip complexity have limited the number to two for most SMT implementations, though there have been as many as 8 threads per core in, for example, the UltraSPARC T2.

Because the technique is really an efficiency solution and there is inevitable increased conflict on shared resources, measuring or agreeing on the effectiveness of the solution can be difficult. Some researchers have shown that the extra threads can be used to proactively seed a shared resource like a cache, to improve the performance of another single thread, and claim this shows that SMT is not just an efficiency solution. Others use SMT to provide redundant computation, for some level of error detection and recovery.

However, in most current cases, SMT is about hiding memory latency, increasing efficiency, and increasing throughput of computations per amount of hardware used.

http://en.wikipedia.org/wiki/UltraSPARC_T2

http://en.wikipedia.org/wiki/Temporal_multithreading



http://en.wikipedia.org/wiki/Preemptive_multitasking

http://en.wikipedia.org/w/index.php?title=Simultaneous_multithreading&action=edit&section=1

http://en.wikipedia.org/wiki/Simultaneous_multithreading#External_links

http://en.wikipedia.org/wiki/Simultaneous_multithreading#References

http://en.wikipedia.org/wiki/Simultaneous_multithreading#See_also

http://en.wikipedia.org/wiki/Simultaneous_multithreading#Disadvantages

http://en.wikipedia.org/wiki/Simultaneous_multithreading#Modern_commercial_implementations

http://en.wikipedia.org/wiki/Simultaneous_multithreading#Historical_implementations

http://en.wikipedia.org/wiki/Simultaneous_multithreading#Taxonomy

http://en.wikipedia.org/wiki/Simultaneous_multithreading#Details

http://en.wikipedia.org/wiki/Simultaneous_multithreading

[edit] Taxonomy

In processor design, there are two ways to increase on-chip parallelism with less resource requirements: one is superscalar technique which tries to increase Instruction Level Parallelism (ILP), the other is multithreading approach exploiting Thread Level Parallelism (TLP).

Superscalar means executing multiple instructions at the same time while chip-level multithreading (CMT) executes instructions from multiple threads within one processor chip at the same time. There are many ways to support more than one thread within a chip, namely:

Interleaved multithreading: Interleaved issue of multiple instructions from different threads, also referred to as Temporal multithreading. It can be further divided into fine-grain multithreading or coarse-grain multithreading depending on the frequency of interleaved issues. Fine-grain multithreading -- such as in a barrel processor -- issues instructions for different threads after every cycle, while coarse-grain multithreading only switches to issue instructions from another thread when the current executing thread causes some long latency events (like page fault etc.). Coarse-grain multithreading is more common for less context switch between threads. For example, Intel's Montecito processor uses coarse-grain multithreading, while Sun's UltraSPARC T1 uses fine-grain multithreading. For those processors that have only one pipeline per core, interleaved multithreading is the only possible way, because it can issue at most one instruction per cycle.

Simultaneous multithreading (SMT): Issue multiple instructions from multiple threads in one cycle. The processor must be superscalar to do so.

Chip-level multiprocessing (CMP or multicore): integrates two or more processors into one chip, each executing threads independently

Any combination of multithreaded/SMT/CMP

The key factor to distinguish them is to look at how many instructions the processor can issue in one cycle and how many threads from which the instructions come. For example, Sun Microsystems' UltraSPARC T1 (known as "Niagara" until its November 14, 2005 release) is a multicore processor combined with fine-grain multithreading technique instead of simultaneous multithreading because each core can only issue one instruction at a time.

[edit] Historical implementations

While multithreading CPUs have been around since the 1950s, simultaneous multithreading was first researched by IBM in 1968. The first major commercial microprocessor developed with SMT was the Alpha 21464 (EV8). This microprocessor was developed by DEC in coordination with Dean Tullsen of the University of California, San Diego, and Susan Eggers and Hank Levy of the University of Washington. The microprocessor was never released, since the Alpha line of microprocessors was discontinued shortly before HP acquired Compaq which had in turn acquired DEC. Dean Tullsen's work was also used to develop the "Hyper-threading" (or "HTT") versions of the Intel Pentium 4 microprocessors, such as the "Northwood" and "Prescott".

http://en.wikipedia.org/wiki/Hyper-threading

http://en.wikipedia.org/wiki/Digital_Equipment_Corporation

http://en.wikipedia.org/wiki/Compaq

http://en.wikipedia.org/wiki/Hewlett-Packard

http://en.wikipedia.org/wiki/Digital_Equipment_Corporation

http://en.wikipedia.org/wiki/Alpha_21464



http://en.wikipedia.org/wiki/Multi-core_(computing)


http://en.wikipedia.org/wiki/Montecito_(processor)

http://en.wikipedia.org/wiki/Barrel_processor


http://en.wikipedia.org/wiki/Multithreading



[edit] Modern commercial implementations

The Intel Pentium 4 was the first modern desktop processor to implement simultaneous multithreading, starting from the 3.06GHz model released in 2002, and since introduced into a number of their processors. Intel calls the functionality Hyper-Threading Technology (HTT), and provides a basic two-thread SMT engine. Intel claims up to a 30% speed improvement compared against an otherwise identical, non-SMT Pentium 4. The performance improvement seen is very application dependent, and some programs actually slow down slightly when HTT is turned on due to increased contention for resources such as bandwidth, caches, TLBs, re-order buffer entries, etc. This is generally the case for poorly written data access routines that cause high latency intercache transactions (cache thrashing) on multi-processor systems. Programs written before multiprocessor and multicore designs were prevalent commonly did not optimize cache access because on a single CPU system there is only a single cache which is always coherent with itself. On a multiprocessor system each CPU or core will typically have its own cache, which is interlinked with the cache of other CPU/cores in the system to maintain cache coherency. If thread A accesses a memory location [00] and thread B then accesses memory location [01] it can cause an intercache transaction particularly where the cache line fill exceeds 2 bytes, as is the case for all modern processors.

The latest[when?] MIPS architecture designs include an SMT system known as "MIPS MT". MIPS MT provides for both heavyweight virtual processing elements and lighter-weight hardware microthreads. RMI, a Cupertino-based startup, is the first MIPS vendor to provide a processor SOC based on 8 cores, each of which runs 4 threads. The threads can be run in fine-grain mode where a different thread can be executed each cycle. The threads can also be assigned priorities.

The IBM POWER5, announced in May 2004, comes as either a dual core DCM, or quad-core or oct-core MCM, with each core including a two-thread SMT engine. IBM's implementation is more sophisticated than the previous ones, because it can assign a different priority to the various threads, is more fine-grained, and the SMT engine can be turned on and off dynamically, to better execute those workloads where an SMT processor would not increase performance. This is IBM's second implementation of generally available hardware multithreading.

Although many people reported that Sun Microsystems' UltraSPARC T1 (known as "Niagara" until its 14 November 2005 release) and the upcoming processor codenamed "Rock" (to be launched ~2009 [1]) are implementations of SPARC focused almost entirely on exploiting SMT and CMP techniques, Niagara is not actually using SMT. Sun refers to these combined approaches as "CMT", and the overall concept as "Throughput Computing". The Niagara has 8 cores, but each core has only one pipeline, so actually it uses fine-grained multithreading. Unlike SMT, where instructions from multiple threads share the issue window each cycle, the processor uses a round robin policy to issue instructions from the next active thread each cycle. This makes it more similar to a barrel processor. Sun Microsystems' Rock processor is different, it has more complex cores that have more than one pipeline.

The Intel Atom, released in 2008, is the first Intel product to feature SMT (marketed as Hyper-Threading) without supporting instruction reordering, speculative execution, or register

http://en.wikipedia.org/wiki/Intel_Atom

http://en.wikipedia.org/wiki/Intel

http://en.wikipedia.org/wiki/Rock_processor

http://en.wikipedia.org/wiki/Sun_Microsystems

http://en.wikipedia.org/wiki/Barrel_processor

http://en.wikipedia.org/wiki/Multi-core_(computing)

http://en.wikipedia.org/wiki/SPARC

http://en.wikipedia.org/wiki/Simultaneous_multithreading#cite_note-0

http://en.wikipedia.org/wiki/Rock_processor

http://en.wikipedia.org/wiki/Codename


http://en.wikipedia.org/wiki/Sun_Microsystems

http://en.wikipedia.org/wiki/POWER5

http://en.wikipedia.org/wiki/IBM

http://en.wikipedia.org/wiki/System-on-a-chip

http://en.wikipedia.org/wiki/MIPS_architecture

http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style_(dates_and_numbers)#Chronological_items

http://en.wikipedia.org/wiki/Re-order_buffer

http://en.wikipedia.org/wiki/Translation_Lookaside_Buffer

http://en.wikipedia.org/wiki/Hyper-Threading

http://en.wikipedia.org/wiki/Pentium_4

http://en.wikipedia.org/wiki/Intel


renaming. Intel reintroduced Hyper-Threading with the Nehalem microarchitecture, after its absence on the Core microarchitecture.

[edit] Disadvantages

Simultaneous multithreading cannot improve performance if any of the shared resources are limiting bottlenecks for the performance. In fact, some applications run slower when simultaneous multithreading is enabled. Critics argue that it is a considerable burden to put on software developers that they have to test whether simultaneous multithreading is good or bad for their application in various situations and insert extra logic to turn it off if it decreases performance. Current operating systems lack convenient API calls for this purpose and for preventing processes with different priority from taking resources from each other [2].

There is also a security problem with simultaneous multithreading. It has been proven that it is possible for one application to steal a cryptographic key from another application running in the same processor by monitoring its cache use [3].

[edit] See also Temporal multithreading , another implementation of hardware multithreading Thread (computer science) , the fundamental software entity scheduled by the operating system

kernel to execute on a CPU or processor (core)

Symmetric multiprocessing , where the system (or partition of a larger computer hardware platform) contains more than one CPU or processor (core) and where the operating system kernel is not limited to which of the available CPUs (cores) a given thread can be scheduled to execute on

[edit] References1. ̂ http://www.theregister.co.uk/2007/12/14/sun_rock_delays/2. ̂ How good is hyperthreading?

3. ̂ Hyper-Threading Considered Harmful

LE Shar and ES Davidson, "A Multiminiprocessor System Implemented through Pipelining", Computer Feb 1974

D.M. Tullsen, S.J. Eggers, and H.M. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism," In 22nd Annual International Symposium on Computer Architecture, June, 1995

D.M. Tullsen, S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, and R.L. Stamm, "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor," In 23rd Annual International Symposium on Computer Architecture, May, 1996

[edit] External links SMT news articles and academic papers SMT research at the University of Washington

http://www.cs.washington.edu/research/smt/

http://www.princeton.edu/~jdonald/research/hyperthreading/


http://www.daemonology.net/hyperthreading-considered-harmful/

http://en.wikipedia.org/wiki/Simultaneous_multithreading#cite_ref-2

http://agner.org/optimize/blog/read.php?i=6&v=t


http://www.theregister.co.uk/2007/12/14/sun_rock_delays/



http://en.wikipedia.org/wiki/Symmetric_multiprocessing





http://en.wikipedia.org/wiki/Key_(cryptography)


http://en.wikipedia.org/wiki/Application_programming_interface


http://en.wikipedia.org/wiki/Core_(microarchitecture)

http://en.wikipedia.org/wiki/Nehalem_(microarchitecture)

Timeline of multithreading technologies

Queue-based primitives for a multithreaded architecture

Source Proceedings of the seventeenth Euromicro conference on Software and hardware : specification and design: specification and design table of contentsVienna, Austria Pages: 113 - 116 Year of Publication: 1992 ISSN:0165-6074 Also published in ...

Author Xiaoming Fan

Publisher Elsevier North-Holland, Inc. Amsterdam, The Netherlands, The Netherlands

Bibliometrics Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 0

Additional Information: index terms

Tools and Actions:

Review this Article Save this Article to a Binder Display Formats: BibTeX EndNote ACM Ref

INDEX TERMS

Primary Classification: D. Software

D.4 OPERATING SYSTEMS

D.4.1 Process Management

Subjects: Multiprocessing/multiprogramming/multitasking

Additional Classification: C. Computer Systems Organization

C.1 PROCESSOR ARCHITECTURES

http://portal.acm.org/results.cfm?query=CCS%3AC1&querydisp=CCS%3AC1&termshow=matchboolean&coll=GUIDE&dl=GUIDE&CFID=108401789&CFTOKEN=56393191

http://portal.acm.org/results.cfm?query=CCS%3AC&querydisp=CCS%3AC&termshow=matchboolean&coll=GUIDE&dl=GUIDE&CFID=108401789&CFTOKEN=56393191

http://portal.acm.org/results.cfm?query=PrimarySubject%3A%22Multiprocessing%2Fmultiprogramming%2Fmultitasking%22&querydisp=PrimarySubject%3A%22Multiprocessing%2Fmultiprogramming%2Fmultitasking%22&termshow=matchboolean&coll=GUIDE&dl=GUIDE&CFID=108401789&CFTOKEN=56393191

http://portal.acm.org/results.cfm?query=PrimaryCCS%3AD41&querydisp=PrimaryCCS%3AD41&termshow=matchboolean&coll=GUIDE&dl=GUIDE&CFID=108401789&CFTOKEN=56393191

http://portal.acm.org/results.cfm?query=PrimaryCCS%3AD4&querydisp=PrimaryCCS%3AD4&termshow=matchboolean&coll=GUIDE&dl=GUIDE&CFID=108401789&CFTOKEN=56393191

http://portal.acm.org/results.cfm?query=PrimaryCCS%3AD&querydisp=PrimaryCCS%3AD&termshow=matchboolean&coll=GUIDE&dl=GUIDE&CFID=108401789&CFTOKEN=56393191

http://portal.acm.org/citation.cfm?id=149590.149617




https://portal.acm.org/poplogin.cfm?dl=GUIDE&coll=GUIDE&comp_id=COMPONENT034&want_href=citation.cfm%3Fid%3D149590.149617%26backfrombind%3D1&CFID=108401789&CFTOKEN=56393191

http://www.reviews.com/reviewer/quickreview/frameset_toplevel.cfm?bib_id=149617

http://portal.acm.org/citation.cfm?id=149590.149617#IndexTerms

http://portal.acm.org/author_page.cfm?id=81344490340&coll=GUIDE&dl=GUIDE&trk=0&CFID=108401789&CFTOKEN=56393191

http://portal.acm.org/citation.cfm?id=149590.149617#opub

http://portal.acm.org/toc.cfm?id=149590&type=proceeding&coll=GUIDE&dl=GUIDE&CFID=108401789&CFTOKEN=56393191

http://www.cs.clemson.edu/~mark/multithreading.html

http://portal.acm.org/citation.cfm?id=149590.149617#CIT

General Terms: Design, Measurement, Performance

This Article has also been published in: Microprocessing and Microprogramming

http://portal.acm.org/citation.cfm?id=149591.149617&coll=GUIDE&dl=GUIDE&CFID=108401789&CFTOKEN=56393191

http://portal.acm.org/results.cfm?query=General%20Terms%3A%22Performance%22&querydisp=General%20Terms%3A%22Performance%22&termshow=matchboolean&coll=GUIDE&dl=GUIDE&CFID=108401789&CFTOKEN=56393191

http://portal.acm.org/results.cfm?query=General%20Terms%3A%22Measurement%22&querydisp=General%20Terms%3A%22Measurement%22&termshow=matchboolean&coll=GUIDE&dl=GUIDE&CFID=108401789&CFTOKEN=56393191

http://portal.acm.org/results.cfm?query=General%20Terms%3A%22Design%22&querydisp=General%20Terms%3A%22Design%22&termshow=matchboolean&coll=GUIDE&dl=GUIDE&CFID=108401789&CFTOKEN=56393191

http://portal.acm.org/citation.cfm?id=149590.149617#CIT

how multi threaded architecture works in db2 9

Documents