clr reliability under memory exhaustion
DESCRIPTION
CLR Reliability under Memory Exhaustion. Solomon Boulos. Temporary Memory Exhaustion causes failures. Out of Memory (OOM) is temporary Shouldn’t cause failure Just wait for memory to become available System take action to free up memory All managed code depends on CLR Testing is difficult - PowerPoint PPT PresentationTRANSCRIPT
07/09/04 Windows Reliability Team 1
CLR Reliability under Memory Exhaustion
Solomon Boulos
07/09/04 Windows Reliability Team 2
Temporary Memory Exhaustion causes failures
• Out of Memory (OOM) is temporary• Shouldn’t cause failure
– Just wait for memory to become available– System take action to free up memory
• All managed code depends on CLR• Testing is difficult
– Exceptions are objects– Boxing (casting value type to object)– JIT compilation
07/09/04 Windows Reliability Team 3
Overview
• Previous Work– Reliability Working Group– Improvements for Whidbey
• OOM behavior– Everett (CLR v1.1)– Whidbey (CLR v2.0)– WinFX
• Solutions– Transactions– Recovery
07/09/04 Windows Reliability Team 4
Reliability Working Group
• Discussion of CLR reliability issues
• Interaction with Yukon and Avalon teams
• FailFast Behavior
• Controversial Decisions
• Fault Injection
07/09/04 Windows Reliability Team 5
Improvements for Whidbey
• CLR hardened to Out of Memory (OOM)
• Constrained Execution Regions (CERs)– Eagerly Prepared (No JIT Compiling)– Blocks ThreadAbort
• Reliability Contracts– Describes reliability attributes of code– Allows for function calls within CER
• Unhandled Exception Policy
07/09/04 Windows Reliability Team 6
My Approach
• Exhaust Memory (Not fault injection)
• Find failure points
• Consistently reproduce results
• Examine underlying causes
• Develop solutions
07/09/04 Windows Reliability Team 7
Everett OOM Behavior
• Different classes of failures– Catchable Out of Memory (OOM) Exception– Type Initialization Exception– Invalid Program exception from JIT compiler– Fatal OOM Error– Fatal Execution Engine error
07/09/04 Windows Reliability Team 8
Supporting Datavoid ManagedFunction(){
Regex* myReg = new Regex("*");
}Available Memory Observed Behavior
0-5860K Fatal Error
5892-5912K InvalidProgram
5924-5960K TypeInit
5890-Above Success
07/09/04 Windows Reliability Team 9
Fault Injection Examplestatic void Main(string[] args){try
{ // operations in here
}catch ( OutOfMemoryException ){Console.WriteLine(“Nothing should get past me.");}
}
07/09/04 Windows Reliability Team 10
Whidbey OOM Behavior
• See OOM Exception instead of– TypeInit– InvalidProgram
• Exception to Native host is COMPlusException– Not very helpful
• Fatal OOM only during initialization– Initialization can be large though (e.g. 10MB)
• CERs provide defense, but dangerous– CER { for (;;) } cannot be stopped
• Reliability Contracts = Honor System
07/09/04 Windows Reliability Team 11
• Swallows exceptions
• Shell– Crashes and restarts
• WinFS– Silent Process Failure
• Indigo– False Completion
WinFX Case Studies
Base OSBase OS
Whidbey
WinFX
07/09/04 Windows Reliability Team 12
Shell Failure
• Exhaust System Memory
• CLR throws OOM Exception
• Shell doesn’t catch
• Escalates to unhandled Win32 exception
• Shell crashes and restarts– Major disruption to user
07/09/04 Windows Reliability Team 13
WinFS Test
• Simple Contact Store Functions– AddContact– RenameContact– RemoveContact– ListContacts– ReachMemory
07/09/04 Windows Reliability Team 14
WinFS Test Normal Execution
• ListContacts() : “No Contacts Found”• AddContact(“Shane”) : Shane is added• ListContacts(): “Shane”• RenameContact(“Shane”, “Bob”): Shane is now
Bob• ListContacts(): “Bob”• RemoveContact(“Bob”): Bob is now deleted• ListContacts(): “No Contacts Found”
07/09/04 Windows Reliability Team 15
WinFS Test Stressed Execution
• ListContacts() : “No Contacts Found”
• ReachMemory(8MB): 8MB Available
• AddContact(“Shane”) : Shane should be added
• ListContacts(): “No Contacts Found”
• Process Exits
07/09/04 Windows Reliability Team 16
Indigo Test Specifications
• Client::SendMessage(): – Sends message to server and prints confirmation of
sending.
• Client::ReceiveMessage(): – Prints received message.
• Server::SendMessage(): – Sends message to client and prints confirmation of
sending.
• Server::ReceiveMessage(): – Prints message and responds with SendMessage()
07/09/04 Windows Reliability Team 17
Indigo Test Behavior
• Normal Execution– Client::SendMessage()– Server::ReceiveMessage()– Server::SendMessage()– Client::ReceiveMessage()
• Execution with Memory Pressure– Client::SendMessage()– Server::ReceiveMessage()– Server::ExhaustMemory()– Server::SendMessage()– Client never receives message
07/09/04 Windows Reliability Team 18
Solutions
• Transactions– In Memory– Durable (backed by disk)
• Recovery– Creates Recovery Log– Allows state restore
07/09/04 Windows Reliability Team 19
Transaction Participantpublic TransactionParticipant(String _originalValue)
{ originalValue = _originalValue;
result = originalValue;}
public void Prepare(IPreparingEnlistment pe){ // do work for transactionresult = "New Value";// all is well, vote preparedpe.Prepared();
}
07/09/04 Windows Reliability Team 20
Transaction Participant Continuedpublic void Commit(IEnlistment e){
// no work to do, vote done e.EnlistmentDone();}public void Rollback(IEnlistment e){
// restore originalValue result = originalValue; if ( null != e ) e.EnlistmentDone();}
07/09/04 Windows Reliability Team 21
Simple Transaction ExampleTransactionParticipant tp = new TransactionParticipant(txtInput.Text);
try
{
using (TransactionScope s = new TransactionScope()){
Transaction.Current.VolatileEnlist(tp,false);
s.Consistent = true;
}
}
catch (TransactionAbortedException){}
txtInput.Text = tp.Result;
07/09/04 Windows Reliability Team 22
rNotepad Techniques
• Log user work– KeyPressed Records– Resize Records
• Write work to log file every second
• Write checkpoint every 30 seconds
• Upon startup, recover– Checkpoint speeds up recovery
07/09/04 Windows Reliability Team 23
Conclusion
• Testing is difficult but possible
• Temporary memory pressure shouldn’t cause failures
• Transactions and Recovery can provide resilient and recoverable solutions
07/09/04 Windows Reliability Team 24
Questions?
• More info athttp://windows/sites/reliavuls/CLR/default.aspx