Debugging a Windows Service and trying to see what

We currently have an automated system that runs as a service for processing satellite images. This service maintains a configuration file, in the configuration file we apply certain scripts(python) to covnert the input satellite imagery into a more usable format. The scripts call the required applications, for the conversion proces. The scripts themselves are invoked by the service via the system("command") (its written in c/c++). (the service uses the same account as the user).

We currently are trying to add support for another satelitte imagery format, the converter is a commerical .exe from ERDAS Imagine(importavhrr), (we do several of our own steps after in the script to change the projection).

The script works fine, up until it hits this:

argslist = ['importavhrr.exe', '-in', '%s' % infn, '-out', '%s' % tmpimg1, '-gui', 'FALSE', '-correct', '-flyingheight', '833', '-rect', 'gcp', gcpfn]
print "".join(argslist)
p = subprocess.Popen(argslist, shell=True, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
print str(p.communicate())

What ends up happening now is, importavhrr.exe just sits there, and does nothing(according to taskmanager it sits there using 0 cpu usage, and the memory usage never changes). As if its waiting for some sort of user input. (Trying os.system, os.spawnv both yield same results) I am guessing some sort of gui element is ether popping up with a gui window of sorts. Closing the process from task manager, returns control to python.

Note: The -gui FALSE/false/0 argument is supposed to prevent a gui from poping up. However if the data is bad (i tested this manually by corrupting the data, and invoking via a script) an error window will popup showing the results.

When i run the script manually (same file, same working directory), it works fine.... the script even works when i invoke it manually using the same system function (its part of an inhouse library) as the service.

Also making the service invoke a batch file with just the importavhrr.exe and the enviroment variables also results in the importavhrr.exe hanging.

Service Sidewise: - Uses the same user account as the one i logged in with - The python script sets around 30-40 envrioment variables for ERDAS - All the enviroment variables are properly set(dumping the enviroment variables when the script is first run, and comparing them to what i get when i print the messages) - Passing the enviroment variables into the subprocess.Popen() yields the same results - The company refuses to help us because they don't support running programs from command line (however it works fine when a user does it, just not a service) - Running the service in debug mode works fine. - I HAVE rebooted the machine.

I am at a loss here, i think (and fear) that the ERDAS executable is making some sort of error message window popup, however i have looked, and looked and can't find any sort of way to see what the service sees. I have been trying to figure this out for almost a week now so yeah.

EDIT

I grabbed the recommended Process Explorer, and looking at the stack thread i have this:

<snip ntoskrnl calls>
ntdll.dll!KiFastSystemCallRet
ntdll.dll!RtlSetLastWin32ErrorAndNtStatusFromNtStatus+0x301
kernel32.dll!GetModuleHandleA+0xdf

After waiting a few minutes, it changes to this:

<snip ntoskrnl calls>
ntdll.dll!KiFastSystemCallRet
USER32.dll!ScrollWindowEx+0x121d
USER32.dll!SoftModalMessageBox+0x6f8
USER32.dll!MessageBoxTimeoutW+0x1d9
USER32.dll!MessageBoxTimeoutW+0x5b
USER32.dll!MessageBoxTimeoutA+0x9c
USER32.dll!MessageBoxExA+0x1b
USER32.dll!MessageBoxA+0x45
elib.dll!esmg_GetLocalTapesDB+0x23b
elib.dll!esmg_LogMessageFunc+0x13a

Well it is trying to show a window, i presume. I don't know anything about their behaviour to see what could be causing esmg_LogMessageFunc to crash. That function is part of their dev tools, which i have 0 access to. Furthermore i have never actually seen erdas log anything.

回答1:

Trying to use any Windows API calls that assume access to the windowstation will cause problems in the security context of a service.

You can use several of the tools from Sysinternals to diagnose this kind of thing. Specifically, consider using Process Explorer in place of Task Manager, and Process Monitor for tracing the activity of a specific process.

Edit: Their new ProcDump tool can be used to get a core dump of any process with really powerful triggering. Several of the latest war stories on Mark Russinovich's Blog take advantage of ProcDump to discover what really happened.

For completeness, a good overview of "official" tools for debugging a service is at this KB article.

回答2:

Did you try to allow the service to interact with the desktop, log into the machine and check if an error box is actually popping up?

回答3:

You might want to start with Process Explorer. You can see the threads and the stacks for those thread. If you really think there's an open window you'll likely see this at the bottom of the stack

kernel32.dll!RegisterWaitForInputIdle+0x49

If that doesn't work out I'd then get a Full Memory Dump of the process and then use WinDBG to see what its doing.