I'm having difficulty trying to determine the cause of a timeout in a Windows Service I've created with C#. I've spent a considerable amount of time looking at several posts and topics on the issue but I am unsure what else to try.
What's the problem?
Sometimes on certain machines which run my windows service, it does not start successfully after the machine has been rebooted. I receive the common EventLog messages about the Service failing to start in a timely fashion and that it timed out after 30000 milliseconds. Windows Server 2003 machines seem to be the most common pattern but is not always isolated to just this OS. For example it works perfectly fine on other W2K3 machines.
The startup failure can be quite random in that sometimes it will start, and other times it will fail so it is very difficult to reproduce the issue on demand. I am also using Log4Net to catch and errors and log them to a RollingFileAppender. However, when the service fails to start, no log file is ever created and no log information saved. It is as if my Service entry thread is blocking and not getting called.
Other Details:
- The Windows service is written in C# and uses .Net 2.0
- There are no other service dependencies for my service when installed.
- The service exe is a Release build with no signing or authenticode signing.
- The OnStart method executes as quickly as possible by creating a Thread and starting that Thread. No other initialization takes place within OnStart.
- When the service does actually fail to start, opening the services list and starting it manually works every time and the service starts in probably less than a second.
I have the following code added to my Program.cs which include the main entry point for the service. I hook in to an UnhandledException event on the CurrentDomain and am using log4net to log any unhandled errors There's also a try/catch around the ServiceBase.Run in the event it somehow bombs out so that I can log that error.
static void Main()
{
ServiceBase[] ServicesToRun;
ServicesToRun = new ServiceBase[]
{
new SchedulerService()
};
AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(CurrentDomain_UnhandledException);
try
{
ServiceBase.Run(ServicesToRun);
}
catch (Exception ex)
{
Log.Fatal("Unhandled Service Exception", ex);
}
}
private static log4net.ILog _log = null;
static log4net.ILog Log
{
get
{
if (_log == null)
{
if (!log4net.LogManager.GetRepository().Configured)
{
log4net.Config.XmlConfigurator.Configure();
}
_log = log4net.LogManager.GetLogger(typeof(Program));
}
return _log;
}
}
static void CurrentDomain_UnhandledException(object sender, UnhandledExceptionEventArgs e)
{
Exception ex = e.ExceptionObject as Exception;
if (ex == null) ex = new Exception(e.ExceptionObject.ToString());
Log.Fatal("Unhandled Service Exception", ex);
}
The code in my inherited ServiceBase implementation is as follows:
protected override void OnStart(string[] args)
{
Thread serviceThread = new Thread(new ThreadStart(BackgroundStart));
serviceThread.IsBackground = true;
serviceThread.Start();
}
private void BackgroundStart()
{
//Initialize and start worker objects to perform monitoring...
//<Snip>
}
My log4net implementation is using a ConsoleAppender and a RollingFileAppender where its configuration details are stored in the App.config.
At this stage I am not sure what else to try. If any more details are needed let me know.
Thanks.
Update: Just to update everyone, I'm going to try some of the suggestions such as logging to the EventLog directly or a file instead of Log4Net to see if that's the cause. I Will also try setting the generatePublisherEvidence in the app.config to false. I'm just waiting for an appropriate downtime to access the client's server to test these things out.
A few things to try:
Add log messages to the top of
Main()
, beforeServiceBase.Run()
, etc. Assuming you get a log file, how do those timestamps compare to the Windows Event Log?Create a brand new Service with the New Project Wizard and deploy it as-is. On the problem machines, does it start reliably?
Get process monitor and watch a normal startup. Look for any unexpected network or file I/O.
Make sure your
SchedulerService
does not do any work in the constructor, and does not have any statically-initialized dependencies.Set the Recovery options to restart on first failure. Does that work reliably?
I'd also suspect that log4net is somehow hanging. Maybe the drive where the log is to be created is not ready yet when the machine is booting. Have you tried starting your service delayed?
In general, spawning a background thread from
OnStart
is the right thing to do.For troubleshooting purposes, you could try to give your service more startup time by calling RequestAdditionalTime method from
OnStart
. Also, you might want to check if any messages have been written to the Windows EventLog (log "Application", the source should be your service name).I fixed similar issue by turning off publisher evidence generation in config file. The service also did not have authenticode signing but adding following line immediately fixed the issue on the machine where it has been reproduced consistently.
Also recommended in this MSDN source:
"We recommend that services use the element to improve startup performance. Using this element can also help avoid delays that can cause a time-out and the cancellation of the service startup. "
Since log4net is not designed to be (in their words) a reliable logging system, I thought it was good practice to write unhandled exceptions to the eventlog (as well as to your log), especially with services.