What's the best way to detect an application crash in XP (produces the same pair of 'error' windows each time - each with same window title) and then restart it?
I'm especially interested to hear of solutions that use minimal system resources as the system in question is quite old.
I had thought of using a scripting language like AutoIt (http://www.autoitscript.com/autoit3/), and perhaps triggering a 'detector' script every few minutes?
Would this be better done in Python, Perl, PowerShell or something else entirely?
Any ideas, tips, or thoughts much appreciated.
EDIT: It doesn't actually crash (i.e. exit/terminate - thanks @tialaramex). It displays a dialog waiting for user input, followed by another dialog waiting for further user input, then it actually exits. It's these dialogs that I'd like to detect and deal with.
How about creating a wrapper application that launches the faulty app as a child and waits for it? If the exit code of the child indicates an error, then restart it, else exit.
Best way is to use a named mutex.
- Start your application.
- Create a new named mutex and take ownership over it
- Start a new process (process not thread) or a new application, what you preffer.
- From that process / application try to aquire the mutex. The process will block
- When application finish release the mutex (signal it)
- The "control" process will only aquire the mutex if either the application finishes or the application crashes.
- Test the resulting state after aquiring the mutex. If the application had crashed it will be WAIT_ABANDONED
Explanation: When a thread finishes without releasing the mutex any other process waiting for it can aquire it but it will obtain a WAIT_ABANDONED as return value, meaning the mutex is abandoned and therfore the state of the section it was protected can be unsafe.
This way your second app won't consume any CPU cycles as it will keep waiting for the mutex (and that's enterely handled by the operating system)
I think the main problem is that Dr. Watson displays a dialog
and keeps your process alive.
You can write your own debugger using the Windows API and
run the crashing application from there.
This will prevent other debuggers from catching the crash of
your application and you could also catch the Exception event.
Since I have not found any sample code, I have written this
Python quick-and-dirty sample. I am not sure how robust it is
especially the declaration of DEBUG_EVENT could be improved.
from ctypes import windll, c_int, Structure
import subprocess
WaitForDebugEvent = windll.kernel32.WaitForDebugEvent
ContinueDebugEvent = windll.kernel32.ContinueDebugEvent
DBG_CONTINUE = 0x00010002L
DBG_EXCEPTION_NOT_HANDLED = 0x80010001L
event_names = {
3: 'CREATE_PROCESS_DEBUG_EVENT',
2: 'CREATE_THREAD_DEBUG_EVENT',
1: 'EXCEPTION_DEBUG_EVENT',
5: 'EXIT_PROCESS_DEBUG_EVENT',
4: 'EXIT_THREAD_DEBUG_EVENT',
6: 'LOAD_DLL_DEBUG_EVENT',
8: 'OUTPUT_DEBUG_STRING_EVENT',
9: 'RIP_EVENT',
7: 'UNLOAD_DLL_DEBUG_EVENT',
}
class DEBUG_EVENT(Structure):
_fields_ = [
('dwDebugEventCode', c_int),
('dwProcessId', c_int),
('dwThreadId', c_int),
('u', c_int*20)]
def run_with_debugger(args):
proc = subprocess.Popen(args, creationflags=1)
event = DEBUG_EVENT()
while True:
if WaitForDebugEvent(pointer(event), 10):
print event_names.get(event.dwDebugEventCode,
'Unknown Event %s' % event.dwDebugEventCode)
ContinueDebugEvent(event.dwProcessId, event.dwThreadId, DBG_CONTINUE)
retcode = proc.poll()
if retcode is not None:
return retcode
run_with_debugger(['python', 'crash.py'])
I realize that you're dealing with Windows XP, but for people in a similar situation under Vista, there are new crash recovery APIs available. Here's a good introduction to what they can do.
Here is a slightly improved version.
In my test the previous code run in an infinite loop when the faulty exe generated an "access violation".
I'm not totally satisfied by my solution because I have no clear criteria to know which exception should be continued and which one couldn't be (The ExceptionFlags is of no help).
But it works on the example I run.
Hope it helps,
Vivian De Smedt
from ctypes import windll, c_uint, c_void_p, Structure, Union, pointer
import subprocess
WaitForDebugEvent = windll.kernel32.WaitForDebugEvent
ContinueDebugEvent = windll.kernel32.ContinueDebugEvent
DBG_CONTINUE = 0x00010002L
DBG_EXCEPTION_NOT_HANDLED = 0x80010001L
event_names = {
1: 'EXCEPTION_DEBUG_EVENT',
2: 'CREATE_THREAD_DEBUG_EVENT',
3: 'CREATE_PROCESS_DEBUG_EVENT',
4: 'EXIT_THREAD_DEBUG_EVENT',
5: 'EXIT_PROCESS_DEBUG_EVENT',
6: 'LOAD_DLL_DEBUG_EVENT',
7: 'UNLOAD_DLL_DEBUG_EVENT',
8: 'OUTPUT_DEBUG_STRING_EVENT',
9: 'RIP_EVENT',
}
EXCEPTION_MAXIMUM_PARAMETERS = 15
EXCEPTION_DATATYPE_MISALIGNMENT = 0x80000002
EXCEPTION_ACCESS_VIOLATION = 0xC0000005
EXCEPTION_ILLEGAL_INSTRUCTION = 0xC000001D
EXCEPTION_ARRAY_BOUNDS_EXCEEDED = 0xC000008C
EXCEPTION_INT_DIVIDE_BY_ZERO = 0xC0000094
EXCEPTION_INT_OVERFLOW = 0xC0000095
EXCEPTION_STACK_OVERFLOW = 0xC00000FD
class EXCEPTION_DEBUG_INFO(Structure):
_fields_ = [
("ExceptionCode", c_uint),
("ExceptionFlags", c_uint),
("ExceptionRecord", c_void_p),
("ExceptionAddress", c_void_p),
("NumberParameters", c_uint),
("ExceptionInformation", c_void_p * EXCEPTION_MAXIMUM_PARAMETERS),
]
class EXCEPTION_DEBUG_INFO(Structure):
_fields_ = [
('ExceptionRecord', EXCEPTION_DEBUG_INFO),
('dwFirstChance', c_uint),
]
class DEBUG_EVENT_INFO(Union):
_fields_ = [
("Exception", EXCEPTION_DEBUG_INFO),
]
class DEBUG_EVENT(Structure):
_fields_ = [
('dwDebugEventCode', c_uint),
('dwProcessId', c_uint),
('dwThreadId', c_uint),
('u', DEBUG_EVENT_INFO)
]
def run_with_debugger(args):
proc = subprocess.Popen(args, creationflags=1)
event = DEBUG_EVENT()
num_exception = 0
while True:
if WaitForDebugEvent(pointer(event), 10):
print event_names.get(event.dwDebugEventCode, 'Unknown Event %s' % event.dwDebugEventCode)
if event.dwDebugEventCode == 1:
num_exception += 1
exception_code = event.u.Exception.ExceptionRecord.ExceptionCode
if exception_code == 0x80000003L:
print "Unknow exception:", hex(exception_code)
else:
if exception_code == EXCEPTION_ACCESS_VIOLATION:
print "EXCEPTION_ACCESS_VIOLATION"
elif exception_code == EXCEPTION_INT_DIVIDE_BY_ZERO:
print "EXCEPTION_INT_DIVIDE_BY_ZERO"
elif exception_code == EXCEPTION_STACK_OVERFLOW:
print "EXCEPTION_STACK_OVERFLOW"
else:
print "Other exception:", hex(exception_code)
break
ContinueDebugEvent(event.dwProcessId, event.dwThreadId, DBG_CONTINUE)
retcode = proc.poll()
if retcode is not None:
return retcode
run_with_debugger(['crash.exe'])