[ieee 2013 35th international conference on software engineering (icse) - san francisco, ca, usa...

3
Fault Comprehension for Concurrent Programs Sangmin Park Georgia Institute of Technology, Atlanta, GA, USA [email protected] Abstract—Concurrency bugs are difficult to find because they occur with specific memory-access orderings between threads. Traditional bug-finding techniques for concurrent programs have focused on detecting raw-memory accesses representing the bugs, and they do not identify memory accesses that are responsible for the same bug. To address these limitations, we present an approach that uses memory-access patterns and their suspicious- ness scores, which indicate how likely they are to be buggy, and clusters the patterns responsible for the same bug. The evaluation on our prototype shows that our approach is effective in handling multiple concurrency bugs and in clustering patterns for the same bugs, which improves understanding of the bugs. I. I NTRODUCTION With the recent advancement of multi-core systems, con- currency has become more and more popular in software development. A recent survey shows that at Microsoft, two- thirds of developers handle concurrency issues, and more than half of these developers had to deal with concurrency bugs at least monthly [4]. Besides productivity loss, concurrency bugs often result in serious problems, such as Nasdaq’s Facebook IPO glitch, which resulted in a loss of 40 million dollars [1], [2]. Concurrency bugs are difficult to find and understand be- cause they occur with specific memory-access orders between threads. Traditional bug-finding techniques for concurrent programs (e.g., data-race detectors [14], [18] and atomicity- violation detectors [5], [11], [12]) have focused on detecting the memory accesses representing the bugs, but do not identify memory accesses that are responsible for the same bug. Thus, developers may need to manually group the accesses to identify and understand a single bug. Moreover, these techniques report only memory-access patterns, but do not provide other context information (e.g., clusters of memory access responsible for the same bug or the actual method locations for the bug). Thus, developers may need to infer such context information from the bug report. To address the limitations of existing techniques, we present a new approach (1) that handles multiple concurrency bugs, where each bug is associated with clusters of memory-access patterns, so that developers can concentrate on the clustered patterns to understand a bug, and (2) that reconstructs bug context information (i.e., memory accesses responsible for the same bug and suspicious methods that have the bug location) from clustered patterns, so that developers can understand the bug with context information instead of raw memory accesses. This work includes our fault-localization work [15], [16] and our on-going work. II. BACKGROUND AND RELATED WORK This section discusses the concurrency bugs that our ap- proach handles along with existing bug-detection and fault- localization techniques for concurrent programs. A. Concurrency Bugs and Patterns Our approach handles two of the most important types of non-deadlock concurrency bugs: order violations and atomicity violations [10]. An order violation occurs when a pair of memory accesses, at least one of which is a write, between two threads appears and leads to unintended program behavior. An atomicity violation occurs when a set of accesses in an atomic region is interfered by other accesses in a different thread, and that interference leads to unintended program behavior. Researchers have identified memory-access patterns for both violations [16], [20], and our approach uses those patterns to identify bugs. B. Related Work Traditional bug-finding techniques for concurrent programs have focused on finding data race bugs [14], [18]. These techniques statically [14] or dynamically [18] find pairs of shared-memory accesses that are not protected by a common lock. Recent work, instead, tried to identify atomicity viola- tions from dynamic memory accesses [5], [11], [12]. These techniques identify shared-memory accesses, execute the pro- gram multiple times, and find the instances of atomicity- violation patterns of shared memory accesses. However, these bug-finding techniques have two major limitations. First, the techniques find only one type of bug, so to find several types of concurrency bugs, developers need to use multiple techniques. Second, the techniques often report benign results with harmful results [14], so developers may need to identify the harmful patterns manually. Recently, researchers have applied statistical fault- localization techniques used for sequential programs to locate concurrency bugs for concurrent programs. CCI [7] samples shared-memory accesses during program executions, computes suspiciousness scores of those memory accesses, and ranks accesses in decreasing order of suspiciousness scores. DefUse [19] collects definition-use pairs between two threads in passing executions, and finds the definition-use pairs in the failing executions that are not in the set of pairs in passing executions Recon [13] monitors shared- memory accesses with five previous consecutive memory accesses, computes suspiciousness of the set of accesses, and reports them in a ranked list. In contrast to bug-finding 978-1-4673-3076-3/13/$31.00 c 2013 IEEE ICSE 2013, San Francisco, CA, USA ACM Student Research Competition 1444

Upload: sangmin

Post on 18-Mar-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2013 35th International Conference on Software Engineering (ICSE) - San Francisco, CA, USA (2013.05.18-2013.05.26)] 2013 35th International Conference on Software Engineering

Fault Comprehension for Concurrent ProgramsSangmin Park

Georgia Institute of Technology, Atlanta, GA, [email protected]

Abstract—Concurrency bugs are difficult to find because they

occur with specific memory-access orderings between threads.

Traditional bug-finding techniques for concurrent programs have

focused on detecting raw-memory accesses representing the bugs,

and they do not identify memory accesses that are responsible

for the same bug. To address these limitations, we present an

approach that uses memory-access patterns and their suspicious-

ness scores, which indicate how likely they are to be buggy, and

clusters the patterns responsible for the same bug. The evaluation

on our prototype shows that our approach is effective in handling

multiple concurrency bugs and in clustering patterns for the same

bugs, which improves understanding of the bugs.

I. INTRODUCTION

With the recent advancement of multi-core systems, con-currency has become more and more popular in softwaredevelopment. A recent survey shows that at Microsoft, two-thirds of developers handle concurrency issues, and more thanhalf of these developers had to deal with concurrency bugs atleast monthly [4]. Besides productivity loss, concurrency bugsoften result in serious problems, such as Nasdaq’s FacebookIPO glitch, which resulted in a loss of 40 million dollars [1],[2].

Concurrency bugs are difficult to find and understand be-cause they occur with specific memory-access orders betweenthreads. Traditional bug-finding techniques for concurrentprograms (e.g., data-race detectors [14], [18] and atomicity-violation detectors [5], [11], [12]) have focused on detectingthe memory accesses representing the bugs, but do not identifymemory accesses that are responsible for the same bug.Thus, developers may need to manually group the accessesto identify and understand a single bug. Moreover, thesetechniques report only memory-access patterns, but do notprovide other context information (e.g., clusters of memoryaccess responsible for the same bug or the actual methodlocations for the bug). Thus, developers may need to infersuch context information from the bug report.

To address the limitations of existing techniques, we presenta new approach (1) that handles multiple concurrency bugs,where each bug is associated with clusters of memory-accesspatterns, so that developers can concentrate on the clusteredpatterns to understand a bug, and (2) that reconstructs bugcontext information (i.e., memory accesses responsible for thesame bug and suspicious methods that have the bug location)from clustered patterns, so that developers can understand thebug with context information instead of raw memory accesses.

This work includes our fault-localization work [15], [16] and our on-goingwork.

II. BACKGROUND AND RELATED WORK

This section discusses the concurrency bugs that our ap-proach handles along with existing bug-detection and fault-localization techniques for concurrent programs.

A. Concurrency Bugs and Patterns

Our approach handles two of the most important types ofnon-deadlock concurrency bugs: order violations and atomicityviolations [10]. An order violation occurs when a pair ofmemory accesses, at least one of which is a write, between twothreads appears and leads to unintended program behavior. Anatomicity violation occurs when a set of accesses in an atomicregion is interfered by other accesses in a different thread,and that interference leads to unintended program behavior.Researchers have identified memory-access patterns for bothviolations [16], [20], and our approach uses those patterns toidentify bugs.

B. Related Work

Traditional bug-finding techniques for concurrent programshave focused on finding data race bugs [14], [18]. Thesetechniques statically [14] or dynamically [18] find pairs ofshared-memory accesses that are not protected by a commonlock. Recent work, instead, tried to identify atomicity viola-tions from dynamic memory accesses [5], [11], [12]. Thesetechniques identify shared-memory accesses, execute the pro-gram multiple times, and find the instances of atomicity-violation patterns of shared memory accesses. However, thesebug-finding techniques have two major limitations. First, thetechniques find only one type of bug, so to find severaltypes of concurrency bugs, developers need to use multipletechniques. Second, the techniques often report benign resultswith harmful results [14], so developers may need to identifythe harmful patterns manually.

Recently, researchers have applied statistical fault-localization techniques used for sequential programs tolocate concurrency bugs for concurrent programs. CCI [7]samples shared-memory accesses during program executions,computes suspiciousness scores of those memory accesses,and ranks accesses in decreasing order of suspiciousnessscores. DefUse [19] collects definition-use pairs between twothreads in passing executions, and finds the definition-usepairs in the failing executions that are not in the set ofpairs in passing executions Recon [13] monitors shared-memory accesses with five previous consecutive memoryaccesses, computes suspiciousness of the set of accesses,and reports them in a ranked list. In contrast to bug-finding

978-1-4673-3076-3/13/$31.00 c© 2013 IEEE ICSE 2013, San Francisco, CA, USAACM Student Research Competition

1444

Page 2: [IEEE 2013 35th International Conference on Software Engineering (ICSE) - San Francisco, CA, USA (2013.05.18-2013.05.26)] 2013 35th International Conference on Software Engineering

techniques, fault-localization techniques are not restrictedto finding one type of bug and report suspicious memoryaccesses with higher rank than benign accesses. However,fault-localization techniques cannot provide the type of bugfor the ranked memory accesses, and, more importantly, theyjust report memory accesses and but do not other contextinformation (e.g., clusters of memory-access patterns) thatassists understanding of the bugs.

III. OUR APPROACH

This section presents the three main steps of our approach.We present our recent approach, UNICORN [16], in Step 1,and discuss our on-going fault-comprehension work, in Steps2 and 3.

Step 1 (Localize suspicious patterns): The goal of Step 1is to obtain fault-localization results for concurrent programs.To do this, Step 1 takes a concurrent program and a testsuite as input, and executes an existing fault-localizationalgorithm to obtain suspicious memory-access patterns thatrepresent concurrency bugs and a coverage matrix that showsthe occurrences of the patterns in testing executions.

The fault-localization technique consists of three sub-steps.First, the technique efficiently collects pairs of shared-memoryaccesses that occurred between multiple threads during mul-tiple program executions using a sliding-window mecha-nism [15]. Second, the technique combines pairs into patternsresponsible for order and atomicity violations [16]. Third,the technique computes the suspiciousness of the patternsusing the program-execution outcome statistics, specificallystatistical fault-localization techniques [9].

Step 2 (Cluster failing executions): The goal of Step 2is to handle multiple bugs by grouping executions that failfor the same bug. To so this, Step 2 executes a clusteringalgorithm for the outputs of Step 1 and outputs a set ofclusters of failing executions. Specifically, Step 2 uses a fault-localization-based clustering algorithm [8], whose intuition isthat failing executions caused by same fault are likely to havesimilar fault-localization results.

Step 3 (Reconstruct bug context): The goal of Step 3 isto reconstruct context information for each concurrency bug.To do this, Step 3 executes another clustering algorithm tocluster patterns responsible for the same bug for each clusterof failing execution of Step 2. Specifically, Step 3 uses acall-stack-similarity-based clustering algorithm [3] to clustermemory-access patterns that are responsible for the same bugbecause memory accesses responsible for the same bug arelikely to appear close in execution and thus are likely to havesimilar call stacks. Step 3 outputs three types of bug context:(1) clustered memory-access patterns that are responsible forthe same bug, (2) suspicious methods that are likely to havebug locations, and (3) a bug graph that visually shows the bug.

IV. EVALUATION AND DISCUSSION

To evaluate our approach, we implemented it in a prototypetool for Java and C++ using the Soot framework1 and the

1http://www.sable.mcgill.ca/soot/

PIN binary instrumentation tool,2 respectively. We performedempirical studies to answer the following questions.

• RQ 1: Can our approach effectively handle multiplebugs?

• RQ 2: Can our approach effectively cluster patternsfor the same bug and provide context information forunderstanding concurrency bugs?

We used a set of Java and C++ programs, including theJava collection library, PBZip2, AGet, and MySQL, whichare open-source desktop and server applications. We createdtest drivers that create a number of threads and can triggerconcurrency bugs with a specific interleaving of threads.

For RQ 1, we ran the failure clustering algorithm of Step2 on the reported ranked list from Step 1 and analyzedhow well the algorithm clusters failing executions responsiblefor the same bug. For space limitations, we put our resultsin a website.3 The results show that the failure clusteringalgorithm correctly clusters failing executions that fail forthe same reason with high F-measure scores, which measurethe effectiveness of clustering. Although other approachesrevealed that the same highly ranked patterns may indicatethe same bug [6], [13], [15], [16], they do not group thefailures using the information. Unlike these approaches, ourapproach is the first clustering approach that handles multipleconcurrency bugs.

For RQ 2, we ran the pattern clustering algorithm of Step 3on the clusters of failing executions of Step 2, and investigatedhow well the pattern clustering algorithm clusters patternsresponsible for the bug and presents the details of the bug. Forspace limitations, we put our results on a website.3 The resultsshow that the pattern-clustering algorithm clusters patternswith call stacks with only one false positive. Moreover, weinvestigated context information and found that the suspiciousmethods contain the locations of the causes of the bugsunlike raw memory accesses do not appear in the actual buglocations. That is, our approach provides more informationabout the bug than other techniques, and thus, can possiblyassist understanding of concurrency bugs [17].

V. CONCLUSION

This paper presents the first fault-comprehension approachfor concurrency bugs that provides a way to explain concur-rency bugs with additional information over memory accessesand that handles multiple concurrency bugs. Our approachinputs concurrent programs with a test suite, locates suspi-cious patterns for multiple program executions, clusters failingexecutions responsible for the same concurrency bug, andreconstructs context information for each concurrency bugby using call-stack-similarity-based clustering algorithm. Wepreformed preliminary studies on a number of Java and C++programs, and the results show that our approach effectivelyhandles multiple bugs and provides clusters of patterns and callstacks to improve understanding of concurrency bugs with fewfalse positives.

2http://pintool.org/3http://www.cc.gatech.edu/⇠sangminp/icse2013/

1445

Page 3: [IEEE 2013 35th International Conference on Software Engineering (ICSE) - San Francisco, CA, USA (2013.05.18-2013.05.26)] 2013 35th International Conference on Software Engineering

REFERENCES

[1] “Nasdaq outlines $40M fund for facebook ipoglitches,” http://abcnews.go.com/blogs/business/2012/06/nasdaq-outlines-40m-fund-for-facebook-ipo-glitches/.

[2] “Nasdaq’s facebook glitch came from race conditions,”http://www.pcworld.com/businesscenter/article/255911/nasdaqsfacebook glitch came from race conditions.html.

[3] Y. Dang, R. Wu, H. Zhang, D. Zhang, and P. Nobel, “ReBucket:A method for clustering duplicate crash reports based on call stacksimilarity,” in ICSE, June 2012, pp. 1084–1093.

[4] P. Godefroid and N. Nagappan, “Concurrency at Microsoft: An ex-ploratory survey,” in (EC)2, July 2008.

[5] C. Hammer, J. Dolby, M. Vaziri, and F. Tip, “Dynamic detection ofatomic-set-serializability violations,” in ICSE, May 2008, pp. 231–240.

[6] G. Jin, L. Song, W. Zhang, S. Lu, and B. Liblit, “Automated atomicity-violation fixing,” in PLDI, Jun. 2011, pp. 389–400.

[7] G. Jin, A. Thakur, B. Liblit, and S. Lu, “Instrumentation and samplingstrategies for cooperative concurrency bug isolation,” in OOPSLA, Oct.2010, pp. 241–255.

[8] J. A. Jones, J. F. Bowring, and M. J. Harrold, “Debugging in parallel,”in ISSTA, Jul. 2007, pp. 16–26.

[9] J. A. Jones, M. J. Harrold, and J. Stasko, “Visualization of test infor-mation to assist fault localization,” in ICSE, May 2002, pp. 467–477.

[10] S. Lu, S. Park, E. Seo, and Y. Zhou, “Learning from mistakes—acomprehensive study on real world concurrency bug characteristics,”in ASPLOS, Mar. 2008, pp. 329–339.

[11] S. Lu, J. Tucek, F. Qin, and Y. Zhou, “AVIO: Detecting atomicityviolations via access interleaving invariants,” in ASPLOS, Oct. 2006,pp. 37–48.

[12] B. Lucia, L. Ceze, and K. Strauss, “ColorSafe: Architectural support fordebugging and dynamically avoiding multi-variable atomicity violation,”in ISCA, 2010, pp. 222–233.

[13] B. Lucia, B. P. Wood, and L. Ceze, “Isolating and understandingconcurrency errors using reconstructed execution fragments,” in PLDI,Jun. 2011, pp. 378–388.

[14] M. Naik and A. Aiken, “Conditional must not aliasing for static racedetection,” in POPL, January 2007, pp. 327–338.

[15] S. Park, R. Vuduc, and M. J. Harrold, “Falcon: Fault localization inconcurrent programs,” in ICSE, May 2010, pp. 245–254.

[16] ——, “A unified approach for localizing non-deadlock concurrencybugs,” in ICST, Apr. 2012, pp. 51–60.

[17] C. Parnin and A. Orso, “Are automated debugging techniques actuallyhelping programmers?” in ISSTA, Jul. 2011, pp. 199–209.

[18] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson,“Eraser: A dynamic data race detector for multithreaded programs,”Trans. Comput. Syst., vol. 15, no. 4, pp. 391–411, 1997.

[19] Y. Shi, S. Park, Z. Yin, S. Lu, Y. Zhou, W. Chen, and W. Zheng, “Do Iuse the wrong definition? DefUse: Definition-use invariants for detectingconcurrency and sequential bugs,” in OOPSLA, Oct. 2010, pp. 160–174.

[20] M. Vaziri, F. Tip, and J. Dolby, “Associating synchronization constraintswith data in an Object-Oriented language,” in POPL, 2006, pp. 334–345.

1446