McaFee File System Filter Driver may cause STOP Error on Windows Server 2003

After analysing one of the dump on production box I realized that McaFee filter driver which sits between Kernel Mode and File System may cause a STOP error on Windows Server 2003 systems. Generally, this is caused by all anti-virus software.


The module for File System Filter Driver of McaFee is: naiavf5x.sys and filter drivers are: NaiFiltr and NaiFsRec. These drivers provide the real time protection for file systems (AKA files and folders). They sit between Kernel Mode and User Mode and runs with Windows NT Executives. The main purpose of this driver is to filter the I/O Operations for all the file systems (C:,D:,E, etc).


The reason for unexpected shutdown is : PAGE_FAULT_IN_NONPAGED_AREA (50). Generally, this STOP error message occurs when there are some issues with: RAM, corrupted NTFS volume or Anti-virus software (Filter drivers). There could be other possibilities of the above mentioned STOP error message but once we have found the root cause (naiavf5x.sys) after checking the memory.dmp, we should eliminate the other possibilities. 


Why this happens:


The file system filter driver of anti-virus keeps the data of its own in the Pagefile area or hard-coded memory. Other Windows processes (specially NTKRNLPA.exe) keep locking this page area when different I/O Operations occur. For example, one application or service is trying to access a file on drive D:\, the Antivirus file system filter driver invokes itself and checks the file (integrity, suspicious file etc) before it can allow application/service to access the file. Since the File System driver is a TSR program (Terminate and Stay Resident), it has to keep its non-volatile data in RAM or pagefile memory area. It retrieves these data at the time of performing I/O Operation (performing an operation when application/service tries to access the file). If this data is not found or not available or locked by other processes then Windows will throw a STOP error message. Okay..you may ask why Windows throws STOP error message, it can also log an event in System log instead of shutting down the system? It doesn’t because any the change occurred in Kernel Mode processes/services always result in system crash. The crashed Kernel Mode processes need to re-initialise itself in order to make itself alive back in the system and this is only possible (only for Kernel Mode processes) when whole system is rebooted (this is as per Windows kernel architecture – first Windows Kerenel mode processes initialise and then User Mode processes. Please note – in Unix,  this is not the case as Linux/Unix Kernel has been separated from processes or third party services. You can always use INIT or other commands to re-initialise processes). Kernel mode processes always run using Realtime Priority that means they can fight with each other when a conflict raises between functions executed by them.


McaFee filter drivers can be located at:


HKLM\SYSTEM\CurrentControlSet\Services – you can find the McaFee filter drivers for naiavf5x.sys under this key. There is also one more way to check this using Device Manager by clicking on View > Show Hidden Devices and then expand Non-Plug and Play Drivers to find the Antivirus drivers.


Solutions:


1. Update the NTKRNLPA.exe – A patch is available from Microsoft. As per below article, this is a recommended patch for the above mentioned STOP error message.


http://support.microsoft.com/kb/832336


2. Update or reduce the functionality of McaFee filter driver: Only one of two can be used.


    a. We can upgrade the McaFee filter driver by installing the latest patch.
    b. Fall back to previous version of  naiavf5x.sys.


OR


1. Disable the McaFee filter driver temporarily by changing the Start value in above mentioned registry key and then setting the value to 4 = SERVICE_DISABLED – This solution is not recommended as this is required in order to allow McaFee to provide real time protection.


In fact, system will already disable the driver on server:


Start  4 = SERVICE_DISABLED 
IMAGE_NAME:  naiavf5x.sys
DEBUG_FLR_IMAGE_TIMESTAMP:  4187c4b7
FAULTING_MODULE: bae27000 Ntfs
DEFAULT_BUCKET_ID:  DRIVER_FAULT
BUGCHECK_STR:  0x50

NLB Notes

How to Test Load Balancing:
===========================
For example you have found nodes in the cluster and want to check whether Load Balancing is working or not. Create different four shares on four nodes
and try to access them from one machine. You must get each time different share when browsing using UNC patch.


0. You can adjust the Convergence parameters by adjusting the following registry values:


    AliveMsgPeriod
    AliveMsgTolerance


0. Configuring more than one VIP (Virtual IP) is available only in Windows 2003 editions and later.


0. There is a different between STOP and DRAINSTOP commands. The STOP command will stop the NLB service on host and all existing connections will be lost
whereas DRAINSTOP will allow NLB to serve corrent connections and disable the new connections at the same time.


0. IGMP can be configured only when Cluster is configured to use Multicast support.


0. Server shouldn’t have any network property opened while configuring using NLB Manager.


0. NLB should have correct local time on all servers.


0. NLB doesn’t detect application failure. For example, a Web Server service may stop but NLB will still send TCP/IP requests to that server.


0. NLB is used for the TCP/IP based application for which the data changes happen rarely.


1. Do not any other protocol except TCP/IP to cluster adapter.


2. NLB Cluster can operate either in Unicast or Multicast mode but not both.


3. Microsoft doesn’t support mix of Server and NLB Cluster.


4. Mix-NLB is allowed. Windows NT WLBS can run in Windows 2000 NLB.


NLB doesn’t support Token Ring and ATM networks. It has only been tested on 10 and 100 MB Ethernet network.


Single Network Card Limitations: when running in Unicast Mode:


1.    Ordinary network communications between cluster hosts not possible.
Network traffic intended for any individual computer within the cluster generates additional networking overhead for all computers in the cluster.
2.    Further to this, we cannot use Network Load Balancing Manager on this computer to configure and manage nlb nodes.


•    Automatically detects and recovers from a failed or offline computer.
•    Automatically balances the network load when hosts are added or removed.
•    Recovers and redistributes the workload within 10 seconds.


5. The load is automatically redistributed to other nodes when a host goes offline. All the active connections to that host are lost. If you are
internationally taking a node offline then you can use the drainsstop command to service all the active connections before you take the node offline.


6. You can have a mix of applications running in the NLB cluster. For example, you can run an IIS Web Server on all nodes and SQL server on
one node only. This way you can designate the traffic for database to SQL server node only.


7. NLB and Clustering both can not be active on same computer but you can form two cluster – Four Node NLB cluster and 2 node server cluster
Is it necessary to have separate subnet for both the technology?


8. NLB Supports upto 32 computers in a single cluster but you can use RRDNS to increate the number.


9. NLB can load balance multiple requests from client on the same node or different node. This is done randomly.


10. NLB automatically detects and remove the failure of NLB Node but it can’t judge whether an application is running or stopped working. This
should be done manually by running a script.


11. Automatically load balances when new hosts are added or removed and this is done within 10 seconds.


12. Different Virtual Cluster IP can be created to load balance different applications.


13. Port rules must be same across the cluster but Port Rules can be different for multiple Virtual IP.


14. NLB doesn’t overlap the original computer name and IP address.


15. NLB can be enabled on multiple network adapters. This allows you to configure different NLB Cluster.


16. NLB can operate in two modes – Unicast or Multicast but both the modes can’t be enabled at the same time. Unicast is the default mode.


17. NLB enables each host to detect and receive incoming TCP/IP traffic. This traffic is received by all the hosts in cluster and NLB driver filter
the traffic as per the Port Rules defined. NLB nodes do not communicate with each other for incoming traffic coming from client because NLB
is enabled on all the nodes. A statistically mapping rule is created on each host to distribute incoming traffic. This mapping remains the same
unless there is a change in the cluster (for example, node removed or added).


18. Convergence is a process to re-build the cluster state. This process invokes when there is a change in cluster (for example, node fails, leaves,
or re-join the cluster). In this process the following actions are taken by cluster:


    1. Re-build the cluster state.
    2. Designate the host with the highest host priority as the Default Host.
    3. Load-balanced traffic is reparationed or re-distributed among the remaining hosts.


During this process, remaining host continues to handle incoming client traffic.If a host is added to the cluster, convergence allows this host to receive its share of the load-balanced traffic. Expansion of the cluster does not affect ongoing cluster operations and is achieved transparently to both Internet clients and to server applications. However, it might affect client sessions that span multiple TCP connections when client affinity is selected, because clients might be remapped to different cluster hosts between connections. For more information on affinity


19. All the nodes in cluster emits the heartbeat messages to tell their availability in the cluster. The default period for sending heartbeat
message is 1 second and 5 missed heartbeat messages from a host cause NLB to invoke Convergence process.


20. We can configure multiple NLB clusters on the same network adapter and then apply the specific port rules to each of those IP addresses.
These are referred to as “Virtual Clusters”.


21. Windows 2003 comes with a GUI tool called: Network Load Balancing Manager and NLB.exe – a command line tool. In Windows 2000 it is WLBS.exe and there is no GUI tool also.
This GUI tool can be installed on XP also to manage only Windows 2003 NLB. NLB Manager uses DCOM and WMI.


22. You should be the member of Administrators group on node for which you are configuring NLB. You don’t need to be an administrator to run the NLB Manager.


23. Single NIC > NLB Enabled in Unicast mode – You can not use NLB Manager on this computer to configure and manage other hosts because a
single network adapter in unicast mode cannot have intrahost communication.


24. Intra-host communication is possible only in multicast hode. To allow communication between servers in the same NLB cluster, each server requires the
following registry entry: a DWORD key named “UnicastInterHostCommSupport” and set to 1, for each network interface card’s
GUID (HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\WLBS\Parameters\Interface\{GUID})


25. There is no restriction on number of adapters. Different hosts can have different network adapters.


Single Network Adapter In Unicast Mode           
======================================


a. Adapter’s own MAC address is disabled: The cluster MAC address which is generated automatically replaces this address.
b. Both the dedicated IP address and Cluster IP Address resolve to the Cluster MAC address.
c. Ordinary Network communication between cluster hosts is not possible.


Cluster Parameters
===================


1. Cluster MAC Address is generated automatically by using the Cluster IP Address and it is unique across the subnet.


2. Remote Control will not work if IPSEC is enabled. Remote control uses 1717 and 2504 on port over UDP.


3. Priority Unique Host ID: Lowest number is the highest – The host with this priority handles all the incoming traffic which are not covered by Port Rules.
If a cluster node is joined with the same priority, it is not accepted as the part of the cluster but other nodes will continue to operate. This is called
the Default Host. If Default Host fails, the other node with higher priority can act as a Default Host.


4. Dedicated IP Address must be entered first in TCP/IP Property. It cannot be DHCP enabled. This same applies to VIP also.


5. You can not add more than 32 Port Rules to one cluster and it must be same across the cluster.


Network Load Balancing Manager
==============================


1. You can not open any network property for the host if NLB Manager currently uses this.


2. NLB can be configured for any machine as long as you have administrative rights on the remote computer.


3. To configure NLB successfully on Windows 2003, use the NLB Manager – Make sure you have unchecked the NLB from all hosts.


4. When you add a host using NLB Manager the Port RUles and associated options will be inherited from the initial host.


5. You can not open other hosts from the NLB Manager if NLB is operating in Single Adpater with Unicast Mode because a single network adapter with unicast
mode cannot have intrahost communication. To make this happen use this registry: UnicastHostInterCommSupport and set it to 1.


6. You can use the Credentials Option in NLB Manager to specify the credentials for remote hosts. NLB Manager will try to connect to remote hosts using this
credentials.


7. You should use either TCP/IP Property settings or NLB Manager but shouldn’t use both to configure NLB.


8. NLB Manager doesn’t connect or show the mis-configured Hosts in a cluster.


9. Hosts for which you don’t have administrative membership will not be displayed in NLB Manager.


10. The list of all port ranges are sorted by Port Range.


Other


1. NLB can have mixed of Domain Controllers, Member Servers, Workgroup servers etc. This is not the limitation of NLB actually. NLB should be able to access
the computer using the built-in administrator account.


2. When you enable NLB on a server, the default registry entries are created under : HKLM\System\CurrentControlSet\Services\WLBS


3. The DIP and VIP must be entered correctly. The cluster nodes will converge with each other if you omit this step but they won’t be able to accept
and traffic.


4. IP Address conflict message is displayed for VIP only. Make sure VIP from all adapters is removed if you uncheck NLB on that host.


5. The following tools can be used with NLB for monitoring:


ClusterSenitel
Data Warehouse Center
HTTPMon – for monitoring IIS Services.
MOM


6. When load balancing PPTP requests, the two network adapters are reuqired on each NLB host.


7. You should supply gateway address in TCP/IP property when configuring two network adapters. The gateway should be entered to FE NIC.


8. NLB must be enabled on the Public or Internet facing network adapter.


9. Loading Balancing a telnet connection requires the associated ports to be opened. A telnet connection spans only one connection per IP so affinity is not required in this case.


10. The original implementation of NLB is WLBS. All the events are recorded in the source of WLBS. The command line interface for NLB is WLBS and in Windows 2003 it is NLB.


11. NLB Manager WMI provider cannot connect to a cluster host for which the computer name starts with a numeric character. This is bug.


12. NLB doesn’t replicate the application data. You might need to use the Microsoft content Replication System (CRS) or third party software.


13. NLB doesn’t monitor the services stop or start also. You can use HTTPMon that comes with Resource kit. You can use the following tools described below:


http://support.microsoft.com/kb/233178/


Exception Monitor
HTTPMon
Third-party monitoring tools
• SiteScope by Mercury Interactive Corporation (http://www.mercury.com (http://www.mercury.com))
• AppManager by NetIQ (http://www.netiq.com (http://www.netiq.com))
• WhatsUp Gold by Ipswitch (http://www.ipswitch.com (http://www.ipswitch.com))


Scenario and Setup Intructions for other services:
==================================================


Terminal Services with NLB
http://support.microsoft.com/kb/243523/en-us


Unicast Mode with Single NIC
============================
In Unicast Mode, NLB modifies the Network Adapter’s MAC address to Cluster MAC. Now, there is only one MAC Address available in cluster – that is Cluaster MAC
and this MAC address has to be same on all cluster hosts. Network Re-director can’t forward the request to same MAC Address if it is originating from the same source.
and also host cannot communicate with each other – This is the disadvantage of Unicast Mode with Single NIC. To enable hosts to talk to each other, enable either
MULTICAST mode or install a second NIC.


14. You may get “No interace is available to configure load balancing” when using network load balancing manager. You get this error if you have imagaed a server
or copied to virtual machine. All network GUIDs will be same. You need to re-install the network adapater from device manager to overcome this problem.


15. While configuring NLB through NLB Manager and you have deleted the host from the cluster. If that status of that still shows pending for a long time then
manually disable the NLB in host. It would disapper from the Manager.


16. It is always best practice to add local host (on where you’re running NLB Manager) after adding all host when you’re running NLB Cluster in Single NIC with
Unicast Mode. Thnd ason is very clear. When you add local host and try to add


17. It is recommended to run NLB Manager on a separate computer which is not part of cluster when you’re running Cluster in Single NIC with Unicast or Multicast Mode.


18. If you have added the local host to NLB Manager in single nic unicast mode and when you refresh, all other hosts will be unreachable.


19. When you access VIP using UNC, you might get the login box if you’re request is being forwarded to a host who is not in domain and you’re member of domain. You might
need to supply user credentials.


20. Crossover cable between NLB nodes doesn’t work correctly for heartbeat messages and others. It works great in server clustering.


21. Heartbeat messages are transmitted over NLB Enabled NIC always whether you’re operating cluster in Unicast or Multicast mode.


22. When an application running on a host dies or stop the NLB will keep forwarding the requests to that server because NLB doesn’t monitor the state of the
application.


23. Only Windows 2003 and later versions can be configured by the NLB Manager. However, you can manage previous versions of Windows but can’t configure them using NLb Manager.


24. Remote control for NLB uses UDP port 2504.


Windows 2008 Network Load Balancing Enhancements:
===================================================


1. There is a support fo IPV6 in Windows server 2008 for NLB. An IPV6 host can join NLB node.


2. Multiple Dedicated IP Addresses are support in Windows Server 2008 for NLB.


3. Supports rolling upgrade from Windows 2003 to Windows 2008.


4. Supports for Unattended NLB Installation


5. Supports for NLB in server Core also.


Questions:


1. Is it possible to access an NLB Host from command line even if the Remote Control is disabled?


2. What is IP Fragmentation and how it related to NLB?


3. What happens when you access the VIP using UNC for example: \\VIP?


4. Is it possible to query the real MAC address of the host if cluster is operating in Single NIC in Unicast mode.


5. Doesn’t NLB Manager refresh itself when a node fails?


6. What is the actual use of UnicastInterHostCommSupport registry entry?


7. Why does it take so long when adding NLB Hosts to NLB Manager?


8. When accessing NLB VIP using UNC I get the share list of only one Node and I don’t get for others.


9. What happens when a rouge server is trying to join the NLB Cluster?

NLB Tables on Cluster Nodes.

 

image

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  table2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Q. How does a NLB Node know that a client session has been retained by an application or Application maintain the client connections?

A. For Affinity: If affinity is selected as “Single”, NLB randomly picks up the hosts to maintain the same connection. For example:
   one client sends the TCP packet on port 20 to 21. It works like this:

    1. Checks the Affinity Table – it checks if the client IP address is listed in this list or not.
    if it is listed in the table then checks the Host Name who served the traffic last time and then forward the request to that host each time the
    same client sends the request.
    2. If client IP is not mentioned in the Affinity Table then it randomly picks up a host from the Host Table. NLB append this in the Affinity Table
    and continue to serve the client.

Revisiting – Communication Protocols: Why services work on TCP or UDP or Both

This article explains why some services work on TCP or UDP or both the protocols. In this article, we will explain the two most commonly services used in network environment: DNS and LDAP.


We often discuss why services use both the protocols i.e. TCP and UDP. These services can also relay on TCP instead of UDP because TCP is a connection-oriented protocol whereas UDP is connection-less then why uses UDP?


There are several reasons explained in this article:


The DNS works on TCP because TCP is a connection-oriented protocol and it requires data to be consistent at the destination whereas UDP is a connection-less protocol and doesn’t require data to be consistent or don’t need a connection to be established with host for consistency of data.


UDP packets are smaller in size and they can not be greater then 512 bytes. So any application needs data to be transferred greater than 512 bytes uses TCP. 


For example, DNS uses both TCP and UDP for valid reasons described below: Note that UDP messages are not larger than 512 Bytes and are truncated when greater than this size. So DNS uses TCP for Zone transfer and UDP for name queries either regular (primary) or reverse. UDP can be used to exchange small information whereas TCP must be used to exchange information larger than 512 bytes. If a client doesn’t get response from DNS it must retransmit the data using TCP after 3-5 seconds of interval.


We all know that there shouldn’t be any inconsistency in DNS zones – to make this happen DNS always transfer Zone data using TCP because TCP is reliable and make sure zone data is consistent by transferring the full zone to other DNS servers who has requested the data.


The problem occurs when Windows 2000 server and Advanced Server products uses Dynamic ports for all above 1023. In this case your DNS server should not be Internet facing i.e. doing all standard queries for client machines on the network. The router (ACLs) must permit all UDP inbound traffic to access any high UDP ports for it to work.


LDAP: LDAP always uses TCP. LDAP doesn’t use UDP because LDAP and Netlogon services at client side requires a secure channel to be established between KDC server and Client computer to send the data and this can be done only using TCP not UDP. UDP is only used when finding a domain controller (Kerberos) for authentication.

How Network Load Balancing Algorithm works internally

This article explains how NLB algorithm works internally from a technical point of view. This article only applies to Windows NT, Windows 2000 Server, Windows Server 2003 and Windows Server 2008.


General rule for a NLB Cluster which applies to each host in the cluster:


1. All port rules (range) defined in a host cluster must be unique across the cluster.


2. Host priority (Default Host) must be unique across the cluster.


3. Cluster mode must be unique across the cluster: either Unicast or Multicast.


A cluster node maintains a statistical mapping of port rules with associated Virtual IP of the cluster. I will give an port rule example and then explain how it works in cluster when an incoming TCP/IP packet arrives to cluster hosts. I have configured the following port rule at one of the cluster host:


Host 1


Port Range: 80 to 80


Protocol: TCP


Host ID: 1


Filtering Mode: Multiple, Load Weight: 70


Virtual IP 10.0.0.1


Host 2


Port Range: 80 to 80


Protocol: TCP


Host ID: 2


Filtering Mode: Multiple, Load Weight: 30


Virtual IP 10.0.0.1


After configuring port rules on cluster hosts, all the hosts simultaneously invoke a process called “Convergence Process”. The main objective of this process to check any inconsistency in the rule defined for that cluster and also designates a host as the Default Host in case of a host fails to converge successfully.


After convergence process has finished all the hosts will maintain a list of statistical mappings in the local computer as portrayed below:



Statistical mapping on Host 1 Counter = 1


Host Name Port Range Protocol Host ID Filtering Mode Load Weight Virtual IP Flag


Host 1 80 To 80 TCP 1 Multiple 70 10.0.0.1 1


Host 2 80 To 80 TCP 2 Multiple 30 10.0.0.1 1



Statistical mapping on Host 2 Counter = 1


Host Name Port Range Protocol Host ID Filtering Mode Load Weight Virtual IP Flag


Host 2 80 To 80 TCP 2 Multiple 30 10.0.0.1 1


Host 1 80 To 80 TCP 1 Multiple 70 10.0.0.1 1


Note: These two hosts are running IIS to host a company web site called CSC.com and this site is mapped to 10.0.0.1 virtual IP Address.


How a host does this internally when a client sends traffic for configured port rule



Let’s take an example: A client running Windows 2000 or XP open up a browser and type the www.csc.com.


  1. Request is forwarded to the cluster IP Address (10.0.0.1).
  2. Cluster receives the traffic at Network Layer where the NLB driver service sits and watches for an incoming packet.
  3. All hosts simultaneously receive this packet and look in its statistical mapping to see if the traffic is covered in the port rules defined or not.
  4. If traffic is covered in the port rule then it checks whether this host has already served or not. The Flag column indicates the status of host whether this host served the last traffic or not. The Flag will be incremented by 1 if this host had served the last traffic. For example: if Host 1 receives the traffic, it will serve the client and then increment the Flag by 1.
  5. In this example, Host 1 receives the packet/traffic and other hosts discard the packet.
  6. If the traffic is not covered by the port rule then the only host will receive the traffic which has been designated as the Default Host. This is identified by the Host ID in statistical mapping.
  7. After Host 1 has served the request the statistical mapping on that host will look like below:


Statistical mapping on Host 1 after serving client Counter = 2


Host Name Port Range Protocol Host ID Filtering Mode Load Weight Virtual IP Flag


Host 2 80 To 80 TCP 2 Multiple 30 10.0.0.1 1


Host 1 80 To 80 TCP 1 Multiple 70 10.0.0.1 2


You notice that Flag value has been incremented by 1 to make sure this host doesn’t receive the next traffic for the configured port rules. This host will service the next traffic only when the Host 2 has served the second request after Host 1.


Please note: There are other things a host consider when receiving the incoming traffic. For example, checking Filtering Mode if configured for a single host or disabled for the configured port rules, Client Affinity, multiple Virtual IP Addresses in a single cluster, Host Priority ID (which is different from Host ID), Mode of the Host (Unicast, Multicast and IGMP), Layer 2 and Layer 3 switch.