How tuned is your time machine?

How tuned is your time machine?  No, I am not talking about DeLoreans with Flux Capacitors, but a tool almost all software development groups have but many use poorly, namely source control.  A simple test of how well your source control system is doing is to see how many times a day the average developer checks in code.  The sad fact is that in most development shops I encounter, the time a file is checked out is measured in weeks—as for a library book.Driver development can benefit greatly from a source control system that encourages its use.  It is amazing how many times I hear “It was working, I must have changed something to break it.”  If you use source control all the time, it is easy to back up to prove things worked, find the version with the set of changes that broke things, and quickly resolve the problem.   For a manager, frequent source control is beneficial to scheduling since it gives you concrete data on the state of the work.  Statements from developers, “I’m 80% of the way done” do not mean anything.  Is this 80% coded, tested or what?  Instead, “The current checked in version supports these capabilities and passes these tests” gives you a solid basis for scheduling and for knowing a project’s status.Source control gets out of tune for two major reasons, either the system is cumbersome to use, or developers ignore it.  Many source control systems become cumbersome because release engineering owns them and requires additional input and checks since every check in is part of the release!  Unfortunately, this approach is wrong because developers need to be able to check in often, and that includes versions with diagnostics that will never go into production.If you are a manager stuck with cumbersome source control, consider setting up your own with something like SourceSafe.  Having two source control systems may mean that you as the manager need to take a stable version out of the lightweight system and check into the release system periodically, but having the history of changes I am talking about is worth it.There also needs to be an environment that encourages checking in often.  The common developer model of “I’ve checked it out, I’ll check it in when all the modifications to the module work”, has to be changed.  Encouraging developers to check in after every change can be a challenge.  Showing the value of frequent revisions for debugging will help.  For a manager, looking at what goes into a revision and encouraging incremental revisions is also needed.  Finally, consider using a source control system where only one developer can have the file checked out at a time.   This not only creates an internal push to finish the change and check it in, but also eliminates the error-prone merge steps in systems that allow multiple checkouts.Encouraging incremental revisions of the code, and using source control wisely is one of the best ways to improve quality and find bugs in drivers.  When you are checking your code in often, it is easy to step into your time machine and track down the bugs.

 

Where was Don?

I haven’t posted for quite a while since I hit a period of intense work for my customers, demands by activities outside of work, and a cold/flu that has persisted for over a month.  I am back and expect to see postings from me on at least a weekly basis.

Tuning Channel 9

Last week I got an announcement that Rob Short is talking on Operating System Evolution (see http://channel9.msdn.com/Showpost.aspx?postid=264874).  Normally, I would be the first in line for one of his talks, but on Channel 9?… well, maybe some day I’ll watch it when I have time.   For those of you who don’t know about Channel 9 http://channel9.msdn.com/ , it is a site where Microsoft provides video interviews on technical subjects.   The reason that I’m not enthusiastic about it is that I believe there are a couple of serious problems with Channel 9.  These are:


1. There is no index


I answer a lot of questions for my customers and on Usenet, but much of the time I do not remember the exact answer, just where to find it.  A typical Channel 9 presentation is close to an hour, and with no index this means that if I think something might be in a presentation, I have to waste a lot of time looking for it.


I do use the WinHEC videos, but they have an index, namely the PowerPoint slides that go with the talk.  I have all the WinHEC slides from the last 10 years, and I still reference one of the old presentations every few months.


2. There is no way to take notes


Those PowerPoint slides from WinHEC have another great feature.  They allow me to put notes in the speaker notes area.  When I listen to the talk I take notes, and depending on the data, I will annotate the PowerPoint file.  Also, I have found that it is a nice way to keep notes about using a technique presented in a talk.


So Redmond, how about fixing Channel 9?  Right now I view Channel 9 as entertainment more than as a useful reference tool.  How about some PowerPoint slides or even just some speaker notes to go with the presentation?  But even there, Hollywood does better with their entertainment.    Since even my Jurassic Park DVD has a chapter index, why doesn’t Channel 9 have an index? You bill Channel 9 as an important information tool. There is no reason it has to be such a dinosaur!

Using the event log in your driver

I wrote previously that drivers should use the event log.   This time I am going to give some things to consider when using events. The challenge for using the event log is that many components use it poorly.  The two common problems are superfluous messages and lazy definitions. 


The event log is commonly configured as a circular log with a limited capacity.  Thus, having a bunch of superfluous messages can cause the important events that lead up to a failure to be lost. If you want to put in things like the driver started or stopped, provide a registry value or other control so these can be disabled.


The second problem, lazy definitions, happens because building the message catalog where the event strings are stored and setting up the registry for it require additional steps.  Developers looked around and found that a number of the common Microsoft error codes took a string for the log entry, and decided to use the Microsoft definition instead of their own.  This is a poor approach for two reasons.  First, since all your errors are coded as the same event, this makes it hard for tools to look for problems in the log.  Second, the event log is designed for internationalization but the strings you dump from your driver will all be in one language.


For internationalization, consider making the message catalog where the text of the messages resides a separate file, rather than including it in the device driver.  The advantage of this is that you can provide the components needed for a support organization to add a new language without having to sign the driver again. 


So what should go in the event log?  Some obvious things are:


·         Failures in DriverEntry, AddDevice and Unload – In all these cases, there is no user request to which to report the problem.


·         Resource failures – These include a malfunction in the hardware or supporting software (for instance, a service that supports the driver) that impacts many requests.


·         Anomalous behavior – This is anything that is unexpected, whether it fails a request or not.  If something you really didn’t expect occurs, even if the driver handles it, log it.


My overall message is that you should add the event log to the diagnostic capabilities you provide your support people and your customers. If you already do this, great!  And if you already have working guidelines for event log use, please share them with a comment to this blog.

Why your driver should use the event log

Do you use the event log in your driver?  Event logging should be standard in almost every driver, yet few drivers support logging.  Event logging is the place to record anomalous conditions and events that are detected by your code. Specifically, it is the recognized way to report errors that are not related to a particular request to the device. 


The event log consists of small records about events of interest.  The record is based on an NTSTATUS code, whether it is a standard code or a custom status code for your software.  Think of the event log as a series of alerts to inform you of what is happening on the system.  If you haven’t looked at it lately, open the event viewer from Administrative Tools, and look at the entries since the last boot of your machine.


There are articles for developers that contend that no one reads the event log.  Yes, the normal user does not look at it, but system administrators certainly do.   When there is a problem with a system, the event log is the first place admins will look to establish a chronology of what happened and possibly see what failed.  The event log is also integrated into many network management tools that administrators use to monitor system health.


So why don’t more drivers use the event log?  Part of the reason for this is Microsoft.   The DDK used to provide a specific sample to illustrate logging, but this was removed years ago.  Worse, some Microsoft developers do not understand the use of the event log.  A few years ago a Microsoft talk confused the purpose of Event Logging with the more recent Event Tracing for Windows (ETW).  ETW is a great capability, but it is designed to provide detailed diagnostics for the developer, not simple alerts for the administrator.


So if you are not using the event log in your drivers, ask yourself or your developers, why aren’t you?  If you are using the event log, there are a number of things to consider, but that needs to wait for another post.

Welcome

What?  Another blog on Windows Device Driver Development?  While there are a number of good blogs out there on the subject, I think mine will be a little different.  This blog will look at the process of device driver development.  A lot of the emphasis will be upon the design and development practices for creating a high quality Windows device driver.


While there will inevitably be some nitty-gritty technical stuff, most of my discussion will be targeted at a level that managers can follow.  In fact, I hope that you will point your management to the blog.  Many of the problems in driver development are caused by  the failure of management and marketing to appreciate the challenges and constraints of working in the kernel.  I should warn you though, that many of the problems are also caused by developers who do not follow well-known good practices.  This blog will discuss ways that managers, developers and Microsoft impact the quality of Windows drivers.


My background includes thirty-six years of developing device drivers.  I have worked on a number of operating system teams, as well as on compilers and other system software.  I am fanatical about driver quality–having been the software architect for a fault tolerant computer company.  Before you think “my driver won’t need that”, consider that the first driver I was paid to develop back in college for use by a graphics design class, ended up being used to display images for surgeons during heart operations!  So never assume you driver will not be critical code.


In any event, welcome and I hope you enjoy the posts.