oca crash analysis andre vachon software development lead windows product feedback microsoft...

36
OCA Crash Analysis OCA Crash Analysis Andre Vachon Andre Vachon Software Development Lead Software Development Lead Windows Product Feedback Windows Product Feedback Microsoft Corporation Microsoft Corporation

Upload: rodger-stevenson

Post on 05-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

OCA Crash AnalysisOCA Crash Analysis

Andre VachonAndre VachonSoftware Development LeadSoftware Development LeadWindows Product FeedbackWindows Product FeedbackMicrosoft CorporationMicrosoft Corporation

Page 2: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

22

What Is OCAWhat Is OCA

Online Crash AnalysisOnline Crash Analysis Free failure analysis service, supported on Free failure analysis service, supported on

Windows XP and later operating systemsWindows XP and later operating systems Gathers direct customer data about Gathers direct customer data about

customer Windows crashescustomer Windows crashes Helps Microsoft and IHVs understand Helps Microsoft and IHVs understand

customer problemscustomer problems

Page 3: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

33

Goals Of OCA Data AnalysisGoals Of OCA Data Analysis

Provide feedback to customers to improve Provide feedback to customers to improve overall satisfaction overall satisfaction Real-time feedback about what caused the problem Real-time feedback about what caused the problem

on their machineon their machine Links to help customers solve problemsLinks to help customers solve problems

Make Windows a more reliable platform Make Windows a more reliable platform Find and fix bugs for all kernel mode bluescreens Find and fix bugs for all kernel mode bluescreens

Make crash data more actionable for developersMake crash data more actionable for developers Help Microsoft and IHVs prioritize problemsHelp Microsoft and IHVs prioritize problems

Page 4: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

44

OCA Data Analysis ProcessOCA Data Analysis Process

Fully automatedFully automated No human interventionNo human intervention Runs in 2-3 secondsRuns in 2-3 seconds

Takes dumps received from the customer and sends Takes dumps received from the customer and sends them to the debuggerthem to the debugger

Execute !analyze in the debuggerExecute !analyze in the debugger Generate a bucket IDGenerate a bucket ID

Store the output of the analysis into the OCA DatabaseStore the output of the analysis into the OCA Database If the bucket ID has a solution, send the solution back to If the bucket ID has a solution, send the solution back to

the customerthe customer

Page 5: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

55

What Does OCA CollectWhat Does OCA Collect

Dump filesDump files Minidumps by defaultMinidumps by default Optionally, customers can submit full dumpsOptionally, customers can submit full dumps

XML dataXML data List of .sys files on the machineList of .sys files on the machine List of PnP IDs enumerated by PnPList of PnP IDs enumerated by PnP

on the machineon the machine

All the data is packaged in a .cab fileAll the data is packaged in a .cab file

Page 6: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

66

What Is In A Kernel MinidumpWhat Is In A Kernel Minidump Header, basic OS information, PRCBHeader, basic OS information, PRCB OS Module list (loaded and unloaded)OS Module list (loaded and unloaded) Faulting EPROCESS, ETHREAD, Stack and contextFaulting EPROCESS, ETHREAD, Stack and context Data pages pointed to by the contextData pages pointed to by the context Data pages pointed to by the bugcheck params (Windows XP SP1)Data pages pointed to by the bugcheck params (Windows XP SP1) Some Optional data pages, if space is available in the dump fileSome Optional data pages, if space is available in the dump file Optional bugcheck callback dataOptional bugcheck callback data Minidumps will never contain all the information (neither Minidumps will never contain all the information (neither

will full dumps)will full dumps) Targeted data collection to allow analysis of the majority of failuresTargeted data collection to allow analysis of the majority of failures We ask specific customers to send us additional data when We ask specific customers to send us additional data when

neededneeded User minidumps contain different types of informationUser minidumps contain different types of information

Page 7: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

77

Minidump ImprovementsMinidump Improvements Windows XPSP1 minidump improvementsWindows XPSP1 minidump improvements

Sysdata.xml contains PNP IDsSysdata.xml contains PNP IDs Save data pages pointed to by bugcheck parametersSave data pages pointed to by bugcheck parameters KeBugCheck routine improvements in Windows XP SP1 and SP2 KeBugCheck routine improvements in Windows XP SP1 and SP2

to collect more targeted data for crashesto collect more targeted data for crashes More data pages pointed to by registersMore data pages pointed to by registers

Windows XP SP2 minidump improvementsWindows XP SP2 minidump improvements More accurately save the context of the crashMore accurately save the context of the crash

Saved all the pages backed by those registersSaved all the pages backed by those registers SMBIOS data tablesSMBIOS data tables MM pool changes better isolate a number of pool corruptionsMM pool changes better isolate a number of pool corruptions

Page 8: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

88

Debugging A Kernel MinidumpDebugging A Kernel Minidump DebuggersDebuggers

Kernel minidumps require using KD or WinDbgKernel minidumps require using KD or WinDbg Both WinDbg and VS supports debugging user mode minidumpsBoth WinDbg and VS supports debugging user mode minidumps

Step 1: Get the imagesStep 1: Get the images A minidump contains minimal data, so code images must be loaded at debug timeA minidump contains minimal data, so code images must be loaded at debug time Use the module timestamps stored in the dump files to find the correct imagesUse the module timestamps stored in the dump files to find the correct images All MS kernel mode code for recent OSes is on the internet symbol serverAll MS kernel mode code for recent OSes is on the internet symbol server

Step 2: Extract PDB information from the imagesStep 2: Extract PDB information from the images The debug record stored in an image used to look for the symbolsThe debug record stored in an image used to look for the symbols If you have the wrong image, wrong symbols will be loadedIf you have the wrong image, wrong symbols will be loaded

Step 3: Get symbolsStep 3: Get symbols Symbol server is again the best solutionSymbol server is again the best solution

Data in the minidump is limitedData in the minidump is limited Look at what you canLook at what you can Some minidumps will not yield useful results if critical information is missingSome minidumps will not yield useful results if critical information is missing

Read the docs for details on loading a minidump in the debuggerRead the docs for details on loading a minidump in the debugger

Page 9: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

99

What Is A Bucket?What Is A Bucket?

Identifies component most likely responsibleIdentifies component most likely responsiblefor the crashfor the crash Based on heuristics in !analyzeBased on heuristics in !analyze Heuristics are continually improvedHeuristics are continually improved

Represents a unique bug or problemRepresents a unique bug or problem If multiple bugs map to a bucket, weIf multiple bugs map to a bucket, we

split the bucketsplit the bucket Responses and solutions are associatedResponses and solutions are associated

to a bucketto a bucket A human has to verify the analysis results before a A human has to verify the analysis results before a

response can be attached to a bucketresponse can be attached to a bucket

Page 10: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

1010

Sample BucketsSample Buckets

OLD_IMAGE_FOO.SYSOLD_IMAGE_FOO.SYS Crash caused by an old version of foo.sysCrash caused by an old version of foo.sys

OLD_IMAGE_foo.sys_DEV_3577OLD_IMAGE_foo.sys_DEV_3577 Crash caused by an old version of foo.sys on device ID 3577Crash caused by an old version of foo.sys on device ID 3577

0x44_BUGCHECKING_DRIVER_foo0x44_BUGCHECKING_DRIVER_foo Driver foo.sys is known to commonly cause bugcheck 0x44Driver foo.sys is known to commonly cause bugcheck 0x44

POOL_CORRUPTION_fooPOOL_CORRUPTION_foo Driver foo.sys is known to cause pool corruptionDriver foo.sys is known to cause pool corruption

0xBE_foo!bar+1a0xBE_foo!bar+1a Driver foo.sys crashed in routine barDriver foo.sys crashed in routine bar

Page 11: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

1111

Customer InteractionCustomer Interaction Send back to customers information about their problem in real-timeSend back to customers information about their problem in real-time

Currently Web-based interactionCurrently Web-based interaction Contains link to web pages hosted by the third-partyContains link to web pages hosted by the third-party Better integration in the OS in the futureBetter integration in the OS in the future

Two categories of feedbackTwo categories of feedback Response: link to a page describing a problem we know about, but is not Response: link to a page describing a problem we know about, but is not

solved yetsolved yet General troubleshooting steps of KB articleGeneral troubleshooting steps of KB article Company wants direct customer feedbackCompany wants direct customer feedback

Solutions: Content that describes how to “fix” a problemSolutions: Content that describes how to “fix” a problem New driversNew drivers

Hosted by ISV, IHV, OEM or Windows UpdateHosted by ISV, IHV, OEM or Windows Update Service PackService Pack Tools to resolve a problemTools to resolve a problem End-of-life statements are acceptable when hosted by the companyEnd-of-life statements are acceptable when hosted by the company

Page 12: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

1212

Creating ResponsesCreating Responses

Responses are linked by the OCA teamResponses are linked by the OCA team Send mail to pfat @ microsoft.com when you find the Send mail to pfat @ microsoft.com when you find the

root cause of a bucket and have a fix for itroot cause of a bucket and have a fix for it

Microsoft has generic templates for various Microsoft has generic templates for various solutions and responsessolutions and responses Redirection to third party sitesRedirection to third party sites Redirection to Windows UpdateRedirection to Windows Update KB Articles, etc.KB Articles, etc.

IHVs and ISVs need to provide static web pages IHVs and ISVs need to provide static web pages to have redirectsto have redirects

Page 13: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

1313

Customer ConnectionCustomer Connection

We collect very limited userWe collect very limited userfeedback todayfeedback today We collect whether responses were helpful or We collect whether responses were helpful or

not to the customernot to the customer OCA intends to improve interactionOCA intends to improve interaction

with customerswith customers Collect Customer repro stepsCollect Customer repro steps Enable direct contact between customer and Enable direct contact between customer and

developerdeveloper Ability for customers to get updated status on Ability for customers to get updated status on

past crashespast crashes

Page 14: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

1414

OCA Crash InvestigationOCA Crash Investigation

Data collected by OCA is stored in a large database for Data collected by OCA is stored in a large database for crash analysis purposescrash analysis purposes

Primary categorization is BucketIDPrimary categorization is BucketID Additional crash data stored in the OCA DBAdditional crash data stored in the OCA DB

OS VersionOS Version Failure dateFailure date Faulting driverFaulting driver Faulting driver timestampFaulting driver timestamp OEM NameOEM Name CPU informationCPU information Bug numberBug number More data as we scale our SQL implementationMore data as we scale our SQL implementation

Page 15: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

1515

OCA Data SharingOCA Data Sharing

IHVsIHVs https://winqual.microsoft.comhttps://winqual.microsoft.com hosts the Error Reporting Site hosts the Error Reporting Site

Secure data sharing with any IHV signed up with WinQualSecure data sharing with any IHV signed up with WinQual Data sharing is done based on file name and file version Data sharing is done based on file name and file version Statistics and actual customer dump files are shared with IHVsStatistics and actual customer dump files are shared with IHVs More improvements coming to the siteMore improvements coming to the site

If you need more information to debug problems, send us mailIf you need more information to debug problems, send us mail

OEMsOEMs OCA data is shared with OEMS on a regular basisOCA data is shared with OEMS on a regular basis OEMs see a list of all the crashes that happen on their machinesOEMs see a list of all the crashes that happen on their machines Expect to hear from your OEM if you have a lot of OCA crashesExpect to hear from your OEM if you have a lot of OCA crashes

Page 16: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

1616

OCA Data NormalizationOCA Data Normalization

The OCA data can not be normalized to The OCA data can not be normalized to determine absolute quality of a driverdetermine absolute quality of a driver OCA is an anonymous, opt-in systemOCA is an anonymous, opt-in system

We don’t know how many users send in reports and how oftenWe don’t know how many users send in reports and how often

We don’t know the software usageWe don’t know the software usage scenarios of customersscenarios of customers We don’t get reports for “success” scenariosWe don’t get reports for “success” scenarios We don’t know what the actual problem wasWe don’t know what the actual problem was

until it’s fixeduntil it’s fixed

Just fix the largest buckets firstJust fix the largest buckets first

Page 17: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

1717

What Is !analyzeWhat Is !analyze

Debugger extension designed to find root cause of bugsDebugger extension designed to find root cause of bugs Automated analysisAutomated analysis Simplifies analysis of known problemsSimplifies analysis of known problems

Understand various states of the OSUnderstand various states of the OS Provides good starting point to analyze complex problemsProvides good starting point to analyze complex problems

Extract commonly used debugging informationExtract commonly used debugging information

Results of the analysis areResults of the analysis are ““Bucket ID”Bucket ID”

Unique string representing the bugUnique string representing the bug An Owner for the problem, extracted from triage.iniAn Owner for the problem, extracted from triage.ini In verbose modeIn verbose mode

Detailed list of all the data found during the analysisDetailed list of all the data found during the analysis

Page 18: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

1818

!analyze Output!analyze Outputkd> !analyze -vkd> !analyze -vTHREAD_STUCK_IN_DEVICE_DRIVER (ea)THREAD_STUCK_IN_DEVICE_DRIVER (ea)<text><text>Debugging Details:Debugging Details:------------------------------------FAULTING_THREAD: 82493da8FAULTING_THREAD: 82493da8DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_FAULTDEFAULT_BUCKET_ID: GRAPHICS_DRIVER_FAULTBUGCHECK_STR: 0xEABUGCHECK_STR: 0xEALAST_CONTROL_TRANSFER: from bf9c148e to bf9c1c8fLAST_CONTROL_TRANSFER: from bf9c148e to bf9c1c8fSTACK_TEXT:STACK_TEXT:ae328db0 bf9c148e af0df9c0 013bca06 ae328df0 xxxxxx!vDmaCopy_r6+0x495ae328db0 bf9c148e af0df9c0 013bca06 ae328df0 xxxxxx!vDmaCopy_r6+0x495ae328dfc bf9a94ef 00000026 ae328ec0 ae329304 xxxxxx!vCopyFBToDMABuffer+0x17aae328dfc bf9a94ef 00000026 ae328ec0 ae329304 xxxxxx!vCopyFBToDMABuffer+0x17a……STACK_COMMAND: .thread ffffffff82493da8 ; kbSTACK_COMMAND: .thread ffffffff82493da8 ; kbFOLLOWUP_IP: xxxxxx!vDmaCopy_r6+495 bf9c1c8f 3b1f cmp ebx,[edi]FOLLOWUP_IP: xxxxxx!vDmaCopy_r6+495 bf9c1c8f 3b1f cmp ebx,[edi]FOLLOWUP_NAME: xxxxxxFOLLOWUP_NAME: xxxxxxSYMBOL_NAME: xxxxxx!vDmaCopy_r6+495SYMBOL_NAME: xxxxxx!vDmaCopy_r6+495MODULE_NAME: xxxxxxMODULE_NAME: xxxxxxIMAGE_NAME: xxxxxx.dllIMAGE_NAME: xxxxxx.dllDEBUG_FLR_IMAGE_TIMESTAMP: 3edc0abbDEBUG_FLR_IMAGE_TIMESTAMP: 3edc0abbBUCKET_ID: 0xEA_xxxxxx!vDmaCopy_r6+495BUCKET_ID: 0xEA_xxxxxx!vDmaCopy_r6+495INTERNAL_BUCKET_URL: http://dbgportal/DBGPortal_ViewBucket.asp?BucketID=0xEA_xxxxxx!INTERNAL_BUCKET_URL: http://dbgportal/DBGPortal_ViewBucket.asp?BucketID=0xEA_xxxxxx!

vDmaCopy_r6%2b495&FrameID=undefinedvDmaCopy_r6%2b495&FrameID=undefinedOCA_CRASHES: xxxxOCA_CRASHES: xxxxINTERNAL_RAID_BUG: http://watson/bug.aspx?DB=6&BugID=840654INTERNAL_RAID_BUG: http://watson/bug.aspx?DB=6&BugID=840654Followup: xxxxxxFollowup: xxxxxx

Page 19: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

1919

!analyze Algorithm!analyze Algorithm

Multi step algorithmMulti step algorithm Uses bugcheck or verifier codeUses bugcheck or verifier code

as initial inputas initial input Does stack analysisDoes stack analysis Uses additional data about known Uses additional data about known

problems provided by developersproblems provided by developers Iterates on all the data above to determine Iterates on all the data above to determine

the root causethe root cause

Page 20: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

2020

Analysis Step 1Analysis Step 1

Use bugcheck parameters to extractUse bugcheck parameters to extractbasic informationbasic information Each bugcheck is processed by a separate Each bugcheck is processed by a separate

routine that understands the meaningroutine that understands the meaningof each parameter of each parameter

Save trap frame, context recording, faulting Save trap frame, context recording, faulting thread, etc.thread, etc.

If specific follow-up or faulting driver is found, If specific follow-up or faulting driver is found, report resultsreport results

Page 21: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

2121

Analysis Step 2Analysis Step 2

Use information in step 1 to get faulting stackUse information in step 1 to get faulting stack Scan the stack for special functions such as Scan the stack for special functions such as

Trap0E to find alternate stackTrap0E to find alternate stack Analyze frames on the final stack to determine Analyze frames on the final stack to determine

most likely culpritmost likely culprit Different weights are assigned to routinesDifferent weights are assigned to routines

Internal kernel routines have lowest weightInternal kernel routines have lowest weight Device drivers have highest weightDevice drivers have highest weight Fine grain control provided by triage.iniFine grain control provided by triage.ini

Highest weight frame found on the stack is treated as Highest weight frame found on the stack is treated as the culpritthe culprit

Page 22: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

2222

Symbol Server And MinidumpsSymbol Server And Minidumps

Minidumps store the timestamp of imagesMinidumps store the timestamp of images Debugger uses the file name, timestamp and image Debugger uses the file name, timestamp and image

size to map the imagesize to map the image Debugger looks for the symbol file name in the Debugger looks for the symbol file name in the

mapped imagemapped image If the wrong image is loaded by the debugger, the If the wrong image is loaded by the debugger, the

symbols will also be wrongsymbols will also be wrong

Storing images and symbols in symbol server is Storing images and symbols in symbol server is the best way for the debugger to get the correct the best way for the debugger to get the correct version of the imageversion of the image Also simplifies archiving of driver versionsAlso simplifies archiving of driver versions

Page 23: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

2323

IHV And ISV SymbolsIHV And ISV Symbols

Symbols greatly help with the automated Symbols greatly help with the automated analysis of failuresanalysis of failures Don’t lose your symbols !Don’t lose your symbols !

Sharing symbols with MicrosoftSharing symbols with Microsoft You can submit symbols with driver submissions to You can submit symbols with driver submissions to

WHQLWHQL On-site vendors can host their own symbol serverOn-site vendors can host their own symbol server Symbol data is stored securelySymbol data is stored securely

Symbols are not shared with other IHVs internallySymbols are not shared with other IHVs internally Symbols are not shared on the external public symbol serverSymbols are not shared on the external public symbol server

Sharing symbols is totally optional, but encouragedSharing symbols is totally optional, but encouraged

Page 24: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

2424

Analysis Step 2 – IHV SymbolsAnalysis Step 2 – IHV Symbols Without valid symbolsWithout valid symbols With valid symbolsWith valid symbols

f18e7968 nt!KeBugCheckEx+0x19f18e7968 nt!KeBugCheckEx+0x19f18e7980 nt!IopfCallDriver+0x18f18e7980 nt!IopfCallDriver+0x18f18e7990 Fastfat!FatSingleAsync+0x74f18e7990 Fastfat!FatSingleAsync+0x74f18e7a5c Fastfat!FatCommonRead+0x88ef18e7a5c Fastfat!FatCommonRead+0x88ef18e7acc Fastfat!FatFsdRead+0x136f18e7acc Fastfat!FatFsdRead+0x136f18e7adc nt!IopfCallDriver+0x31f18e7adc nt!IopfCallDriver+0x31f18e7ae8 SYMEVENT!f18e7ae8 SYMEVENT!CSymIrp::IrpRead+0x4bCSymIrp::IrpRead+0x4bf18e7af8 nt!IopfCallDriver+0x31f18e7af8 nt!IopfCallDriver+0x31f18e7b0c nt!IopPageReadInternal+0xf2f18e7b0c nt!IopPageReadInternal+0xf2f18e7b2c nt!IoPageRead+0x19f18e7b2c nt!IoPageRead+0x19f18e7b9c nt!MiDispatchFault+0x270f18e7b9c nt!MiDispatchFault+0x270f18e7bec nt!MmAccessFault+0x5b7f18e7bec nt!MmAccessFault+0x5b7f18e7bec nt!_KiTrap0E+0xb8f18e7bec nt!_KiTrap0E+0xb8f18e7cc4 nt!CcMapData+0xeff18e7cc4 nt!CcMapData+0xeff18e7cf0 Fastfat!FatReadVolumeFile+0x38f18e7cf0 Fastfat!FatReadVolumeFile+0x38f18e7e78 Fastfat!FatMountVolume+0x1f7f18e7e78 Fastfat!FatMountVolume+0x1f7f18e7e98 Fastfat!f18e7e98 Fastfat!FatCommonFileSystemControl+0x47FatCommonFileSystemControl+0x47

BUCKET_ID: BUCKET_ID: POOL_CORRUPTION_Foo.sysPOOL_CORRUPTION_Foo.sys

f18e7968 nt!KeBugCheckEx+0x19f18e7968 nt!KeBugCheckEx+0x19f18e7980 nt!IopfCallDriver+0x18f18e7980 nt!IopfCallDriver+0x18f18e7990 Fastfat!FatSingleAsync+0x74f18e7990 Fastfat!FatSingleAsync+0x74f18e7a5c Fastfat!FatCommonRead+0x88ef18e7a5c Fastfat!FatCommonRead+0x88ef18e7acc Fastfat!FatFsdRead+0x136f18e7acc Fastfat!FatFsdRead+0x136f18e7adc nt!IopfCallDriver+0x31f18e7adc nt!IopfCallDriver+0x31f18e7b0c SYMEVENT+0x61cbf18e7b0c SYMEVENT+0x61cbf18e7b2c nt!IoPageRead+0x19f18e7b2c nt!IoPageRead+0x19f18e7b9c nt!MiDispatchFault+0x270f18e7b9c nt!MiDispatchFault+0x270f18e7bec nt!MmAccessFault+0x5b7f18e7bec nt!MmAccessFault+0x5b7f18e7bec nt!_KiTrap0E+0xb8f18e7bec nt!_KiTrap0E+0xb8f18e7cc4 nt!CcMapData+0xeff18e7cc4 nt!CcMapData+0xeff18e7cf0 Fastfat!FatReadVolumeFile+0x38f18e7cf0 Fastfat!FatReadVolumeFile+0x38f18e7e78 Fastfat!FatMountVolume+0x1f7f18e7e78 Fastfat!FatMountVolume+0x1f7f18e7e98 Fastfat!f18e7e98 Fastfat!FatCommonFileSystemControl+0x47FatCommonFileSystemControl+0x47f18e7ee4 Fastfat!f18e7ee4 Fastfat!FatFsdFileSystemControl+0x85FatFsdFileSystemControl+0x85f18e7ef4 nt!IopfCallDriver+0x31f18e7ef4 nt!IopfCallDriver+0x31f18e7f44 nt!IopMountVolume+0x1d1f18e7f44 nt!IopMountVolume+0x1d1

BUCKET_ID: 0x35_SYMEVENT+61cbBUCKET_ID: 0x35_SYMEVENT+61cb

Page 25: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

2525

Analysis Step 3Analysis Step 3

If stack does not yield an interesting If stack does not yield an interesting frame, analyze raw stack dataframe, analyze raw stack data Iterate on all stack values using the same Iterate on all stack values using the same

weight algorithmweight algorithm The ‘dps’ command will show that outputThe ‘dps’ command will show that output

This finds drivers that corrupt the stackThis finds drivers that corrupt the stack

Page 26: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

2626

Analysis Step 4Analysis Step 4

Check for presence of memory or pool Check for presence of memory or pool corrupting driverscorrupting drivers

Check for corrupted code streamsCheck for corrupted code streamsusing !chkimgusing !chkimg Bad RAMBad RAM

Check for other possible problems, such Check for other possible problems, such as invalid call sequencesas invalid call sequences Possible CPU problemPossible CPU problem

Page 27: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

2727

Pool CorruptionPool Corruption

Pool corruption is very badPool corruption is very bad Driver A crashes because of driver B’s bugDriver A crashes because of driver B’s bug Very hard to identify the culpritVery hard to identify the culprit We estimate about 15% of all crashes are caused by We estimate about 15% of all crashes are caused by

pool corruptionpool corruption

Many OCA failures are due to pool corruptionMany OCA failures are due to pool corruption Every vendor has buckets assigned to them that are Every vendor has buckets assigned to them that are

due to another driverdue to another driver

Run Driver Verifier !Run Driver Verifier ! Track down all pool corruptions and fix them !Track down all pool corruptions and fix them !

Page 28: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

2828

Hardware IssuesHardware Issues Hardware problems are quite commonHardware problems are quite common

Heating issuesHeating issues Investigating data in SMBIOS and ACPI to help with thisInvestigating data in SMBIOS and ACPI to help with this

Bad DMABad DMA May be detectable in the future with new hardware support in the May be detectable in the future with new hardware support in the

processorprocessor Bad diskBad disk

Diagnosis tools are being investigatedDiagnosis tools are being investigated Chipset problems (timing issues)Chipset problems (timing issues)

No known detection mechanismsNo known detection mechanisms CPU bugsCPU bugs

No known detection mechanismsNo known detection mechanisms Power glitches, surgePower glitches, surge

No known detection mechanismsNo known detection mechanisms Bad memoryBad memory

Developing algorithms to detect bad memory from a minidumpDeveloping algorithms to detect bad memory from a minidump Shipping a stand-alone memory checkerShipping a stand-alone memory checker

http://http://oca.microsoft.com/en/windiag.aspoca.microsoft.com/en/windiag.asp

Page 29: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

2929

Analysis Step 5Analysis Step 5

Generate final bucket ID and follow-up Generate final bucket ID and follow-up based on all gathered informationbased on all gathered information Determine which fields need to be embedded Determine which fields need to be embedded

in the bucket IDin the bucket ID

Assign ownership of failureAssign ownership of failure Lookup in the OCA database for bug ID or Lookup in the OCA database for bug ID or

solution for this bucketsolution for this bucket

Page 30: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

3030

Triage.iniTriage.ini

Data file used to drive !analyze heuristics.Data file used to drive !analyze heuristics.It containsIt contains Lists of known bad driversLists of known bad drivers Reliability of certain routines within a driverReliability of certain routines within a driver Who owns a particular module or routineWho owns a particular module or routine How certain bucket IDs should be generatedHow certain bucket IDs should be generated

!analyze parses all the data in triage.ini to !analyze parses all the data in triage.ini to generate the final resultsgenerate the final results

Data updated on a daily basisData updated on a daily basis New tokens to control bucketing added regularlyNew tokens to control bucketing added regularly

Page 31: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

3131

Triage.ini TokensTriage.ini Tokens

Timestamps – link date, in HEX formatTimestamps – link date, in HEX format Driver – full name of the imageDriver – full name of the image Module – name of the image without the extensionModule – name of the image without the extension Name – owner of that routine or moduleName – owner of that routine or module

poolcorruptors!<driver> = <timestamp>poolcorruptors!<driver> = <timestamp>

memorycorruptors!<driver> = <timestamp>memorycorruptors!<driver> = <timestamp>

oldimages!<driver> = <timestamp>oldimages!<driver> = <timestamp>

bugcheckingdriver!0x6_<driver> = <timestamp>bugcheckingdriver!0x6_<driver> = <timestamp>

Additional_DriverInfo!<driver> = Build, deviceID, OffsetAdditional_DriverInfo!<driver> = Build, deviceID, Offset

<module>!<routine> = Ignore_<module>!<routine> = Ignore_

<module>!<routine> = maybe_<name><module>!<routine> = maybe_<name>

<module>!<routine> = specific_<name><module>!<routine> = specific_<name>

<module>!<routine> = last_<name><module>!<routine> = last_<name>

Page 32: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

3232

Changing Your OCA BucketsChanging Your OCA Buckets

Images and SymbolsImages and Symbols Sharing images and symbols with Microsoft can allow Sharing images and symbols with Microsoft can allow

your buckets to be merged, or routines ignoredyour buckets to be merged, or routines ignored

Triage.ini changeTriage.ini change Algorithm changesAlgorithm changes

!analyze is not directly extensible by third parties yet!analyze is not directly extensible by third parties yet !analyze can call driver specific analysis routines. Can be !analyze can call driver specific analysis routines. Can be

used to parse bugcheck data blockused to parse bugcheck data block

For any improvements, send mail to pfat @ For any improvements, send mail to pfat @ microsoft.commicrosoft.com

Page 33: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

3333

RetriagingRetriaging

Process of re-analyzing crashesProcess of re-analyzing crashes Re-execute !analyze on the dump file and Re-execute !analyze on the dump file and

update the database informationupdate the database information

Done when a developer gives us an Done when a developer gives us an analysis changeanalysis change Triage.iniTriage.ini New !analyze heuristicNew !analyze heuristic

Dumps that are retriaged can goDumps that are retriaged can gointo new bucketsinto new buckets

Page 34: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

3434

Call To ActionCall To Action

Look at your OCA failuresLook at your OCA failures These are REAL customer problemsThese are REAL customer problems

Fix your pool corruption problemsFix your pool corruption problems Tell us about the bugs you fix, so we can Tell us about the bugs you fix, so we can

update !analyze and point customers to your update !analyze and point customers to your driver updatesdriver updates

Attend the WinDbg Ask the Experts Attend the WinDbg Ask the Experts sessionssessions

Page 35: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation

3535

ResourcesResources

Debugger URL and download siteDebugger URL and download site http://www.microsoft.com/whdc/ddk/debugginghttp://www.microsoft.com/whdc/ddk/debugging

Debugger e-mail – for debugger bug reports and Debugger e-mail – for debugger bug reports and feature requestsfeature requests windbgfbwindbgfb @ microsoft.com @ microsoft.com We try to fix all the bugs people reportWe try to fix all the bugs people report We do not provide general debugging supportWe do not provide general debugging support

on this aliason this alias

Debugger newsgroupDebugger newsgroup Microsoft.public.windbgMicrosoft.public.windbg Good place for general debugging issuesGood place for general debugging issues

Page 36: OCA Crash Analysis Andre Vachon Software Development Lead Windows Product Feedback Microsoft Corporation