Using the event log in your driver

I wrote previously that drivers should use the event log.   This time I am going to give some things to consider when using events. The challenge for using the event log is that many components use it poorly.  The two common problems are superfluous messages and lazy definitions. 


The event log is commonly configured as a circular log with a limited capacity.  Thus, having a bunch of superfluous messages can cause the important events that lead up to a failure to be lost. If you want to put in things like the driver started or stopped, provide a registry value or other control so these can be disabled.


The second problem, lazy definitions, happens because building the message catalog where the event strings are stored and setting up the registry for it require additional steps.  Developers looked around and found that a number of the common Microsoft error codes took a string for the log entry, and decided to use the Microsoft definition instead of their own.  This is a poor approach for two reasons.  First, since all your errors are coded as the same event, this makes it hard for tools to look for problems in the log.  Second, the event log is designed for internationalization but the strings you dump from your driver will all be in one language.


For internationalization, consider making the message catalog where the text of the messages resides a separate file, rather than including it in the device driver.  The advantage of this is that you can provide the components needed for a support organization to add a new language without having to sign the driver again. 


So what should go in the event log?  Some obvious things are:


·         Failures in DriverEntry, AddDevice and Unload – In all these cases, there is no user request to which to report the problem.


·         Resource failures – These include a malfunction in the hardware or supporting software (for instance, a service that supports the driver) that impacts many requests.


·         Anomalous behavior – This is anything that is unexpected, whether it fails a request or not.  If something you really didn’t expect occurs, even if the driver handles it, log it.


My overall message is that you should add the event log to the diagnostic capabilities you provide your support people and your customers. If you already do this, great!  And if you already have working guidelines for event log use, please share them with a comment to this blog.

Why your driver should use the event log

Do you use the event log in your driver?  Event logging should be standard in almost every driver, yet few drivers support logging.  Event logging is the place to record anomalous conditions and events that are detected by your code. Specifically, it is the recognized way to report errors that are not related to a particular request to the device. 


The event log consists of small records about events of interest.  The record is based on an NTSTATUS code, whether it is a standard code or a custom status code for your software.  Think of the event log as a series of alerts to inform you of what is happening on the system.  If you haven’t looked at it lately, open the event viewer from Administrative Tools, and look at the entries since the last boot of your machine.


There are articles for developers that contend that no one reads the event log.  Yes, the normal user does not look at it, but system administrators certainly do.   When there is a problem with a system, the event log is the first place admins will look to establish a chronology of what happened and possibly see what failed.  The event log is also integrated into many network management tools that administrators use to monitor system health.


So why don’t more drivers use the event log?  Part of the reason for this is Microsoft.   The DDK used to provide a specific sample to illustrate logging, but this was removed years ago.  Worse, some Microsoft developers do not understand the use of the event log.  A few years ago a Microsoft talk confused the purpose of Event Logging with the more recent Event Tracing for Windows (ETW).  ETW is a great capability, but it is designed to provide detailed diagnostics for the developer, not simple alerts for the administrator.


So if you are not using the event log in your drivers, ask yourself or your developers, why aren’t you?  If you are using the event log, there are a number of things to consider, but that needs to wait for another post.

Welcome

What?  Another blog on Windows Device Driver Development?  While there are a number of good blogs out there on the subject, I think mine will be a little different.  This blog will look at the process of device driver development.  A lot of the emphasis will be upon the design and development practices for creating a high quality Windows device driver.


While there will inevitably be some nitty-gritty technical stuff, most of my discussion will be targeted at a level that managers can follow.  In fact, I hope that you will point your management to the blog.  Many of the problems in driver development are caused by  the failure of management and marketing to appreciate the challenges and constraints of working in the kernel.  I should warn you though, that many of the problems are also caused by developers who do not follow well-known good practices.  This blog will discuss ways that managers, developers and Microsoft impact the quality of Windows drivers.


My background includes thirty-six years of developing device drivers.  I have worked on a number of operating system teams, as well as on compilers and other system software.  I am fanatical about driver quality–having been the software architect for a fault tolerant computer company.  Before you think “my driver won’t need that”, consider that the first driver I was paid to develop back in college for use by a graphics design class, ended up being used to display images for surgeons during heart operations!  So never assume you driver will not be critical code.


In any event, welcome and I hope you enjoy the posts.