After analysing one of the dump on production box I realized that McaFee filter driver which sits between Kernel Mode and File System may cause a STOP error on Windows Server 2003 systems. Generally, this is caused by all anti-virus software.
The module for File System Filter Driver of McaFee is: naiavf5x.sys and filter drivers are: NaiFiltr and NaiFsRec. These drivers provide the real time protection for file systems (AKA files and folders). They sit between Kernel Mode and User Mode and runs with Windows NT Executives. The main purpose of this driver is to filter the I/O Operations for all the file systems (C:,D:,E, etc).
The reason for unexpected shutdown is : PAGE_FAULT_IN_NONPAGED_AREA (50). Generally, this STOP error message occurs when there are some issues with: RAM, corrupted NTFS volume or Anti-virus software (Filter drivers). There could be other possibilities of the above mentioned STOP error message but once we have found the root cause (naiavf5x.sys) after checking the memory.dmp, we should eliminate the other possibilities.
Why this happens:
The file system filter driver of anti-virus keeps the data of its own in the Pagefile area or hard-coded memory. Other Windows processes (specially NTKRNLPA.exe) keep locking this page area when different I/O Operations occur. For example, one application or service is trying to access a file on drive D:\, the Antivirus file system filter driver invokes itself and checks the file (integrity, suspicious file etc) before it can allow application/service to access the file. Since the File System driver is a TSR program (Terminate and Stay Resident), it has to keep its non-volatile data in RAM or pagefile memory area. It retrieves these data at the time of performing I/O Operation (performing an operation when application/service tries to access the file). If this data is not found or not available or locked by other processes then Windows will throw a STOP error message. Okay..you may ask why Windows throws STOP error message, it can also log an event in System log instead of shutting down the system? It doesn’t because any the change occurred in Kernel Mode processes/services always result in system crash. The crashed Kernel Mode processes need to re-initialise itself in order to make itself alive back in the system and this is only possible (only for Kernel Mode processes) when whole system is rebooted (this is as per Windows kernel architecture – first Windows Kerenel mode processes initialise and then User Mode processes. Please note – in Unix, this is not the case as Linux/Unix Kernel has been separated from processes or third party services. You can always use INIT or other commands to re-initialise processes). Kernel mode processes always run using Realtime Priority that means they can fight with each other when a conflict raises between functions executed by them.
McaFee filter drivers can be located at:
HKLM\SYSTEM\CurrentControlSet\Services – you can find the McaFee filter drivers for naiavf5x.sys under this key. There is also one more way to check this using Device Manager by clicking on View > Show Hidden Devices and then expand Non-Plug and Play Drivers to find the Antivirus drivers.
1. Update the NTKRNLPA.exe – A patch is available from Microsoft. As per below article, this is a recommended patch for the above mentioned STOP error message.
2. Update or reduce the functionality of McaFee filter driver: Only one of two can be used.
a. We can upgrade the McaFee filter driver by installing the latest patch.
b. Fall back to previous version of naiavf5x.sys.
1. Disable the McaFee filter driver temporarily by changing the Start value in above mentioned registry key and then setting the value to 4 = SERVICE_DISABLED – This solution is not recommended as this is required in order to allow McaFee to provide real time protection.
In fact, system will already disable the driver on server:
Start 4 = SERVICE_DISABLED
FAULTING_MODULE: bae27000 Ntfs